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Preface 


It  i$  probably  true  quite  generally  that  in  the  history  of  human  thinking 
the  most  fruitful  developments  frequently  take  place  at  those  points 
xohere  tzoo  different  lines  of  thought  meet.  Hence,  if  they  actually  meet, 
that  is,  if  theii  are  at  least  so  much  related  to  each  other  that  a  real 
interaction  can  take  place,  then  one  may  hope  that  new  and  interesting 
developments  may  follow 

Werner  Heisenberg 

This  volume  contains  papers  presented  at  the  July  1991  NATO  Advanced 
Study  Institute  Probabilistic  and  Stochastic  Methods  in  Analysis  with  Appli¬ 
cations.  The  conference  was  held  at  the  beautiful  II  Ciocco  resort  near  Lucca, 
in  the  glorious  Tuscany  region  of  northern  Italy.  The  dynamic  interaction 
between  world-renowned  scientists  from  the  usually  disparate  communities 
of  pure  mathematicians  and  applied  scientists,  which  occurred  at  our  1989 
ASI,  Fourier  Analysis  and  its  Applications,  continued  at  this  meeting. 

Probability  has  been  an  important  part  of  mathematics  for  more  than 
three  centuries.  Moreover,  its  importance  has  grown  in  recent  decades  with 
continuing  increases  in  computational  power.  Faster  and  more  powerful  dig¬ 
ital  computers,  now  readily  available  to  almost  all  scientists,  have  enabled 
them  to  use  probabilistic  and  stochastic  techniques  to  attack  real-world  prob¬ 
lems  not  considered  feasible  only  a  few  years  ago.  This  approach  has  been 
used  in  'uch  engineering  areas  as:  speech  and  image  processing,  including 
the  recent  approaches  employing  wavelets,  geophysical  exploration,  radar, 
sonar,  etc. — and  was  a  major  focus  of  our  ASI. 

Among  the  papers  to  be  found  herein  on  these  subjects  are  three  ex¬ 
ceptionally  clear  expositions  on  wavelets,  frames,  and  their  applications  by 
John  Benedetto,  Stephane  jaffard,  and  Stephane  Mallat,  an  illuminating  de¬ 
scription  of  holography  and  other  image  processing  technique.s  by  Walter 
Schempp;  and  interesting  works  on  sampling  theory  and  methods  by  Charly 
Grdchenig,  Bill  Heller,  Christian  HoudrG  Keh-Shin  Lii,  and  Tapan  Sarkar, 
Part  of  the  conference  was  devoted  to  the  connections  between  proba¬ 
bility  and  partial  differential  equations,  an  area  of  extremely  active  current 
research.  The  reader  will  see  how  these  fields  have  united,  yielding  new 
insight  into  known  analytic  facts,  such  as  probabilistic  representations  of 
solutions  to  elliptic  and  parabolic  PDF's.  Furthermore,  this  unification  is 
providing  both  new  and  simplified  approaches  to  classical  problems  in  prob¬ 
ability,  such  as  the  PDF  method  for  large  deviation  problems.  Highlights 
of  this  section  of  the  proceedings  are  in-depth  introductions  to  stochastic 
optimal  control  and  filtering  theory — both  new  research  fields  of  particular 

vii 


interest  for  applications,  presented  by  two  recognized  experts,  Piermarct' 
Cannarsa  and  Gopinath  Kallianpur. 

Another  part  of  the  conference  dealt  with  the  application  of  probabilis¬ 
tic  techniques  to  mathematical  analysis.  The  lovely  paper  by  Jean-Pierre 
Kahane,  a  true  pioneer  in  this  field,  is  a  standout  among  the  many  wonder¬ 
ful  works  in  this  volume.  Babar  Saffari,  describing  the  use  o.*^  probability 
methods  in  Fourier  analysis,  presents  a  very  complex  subject  with  excep¬ 
tional  clarity. 

Finally,  there  are  several  papers  which  are  difficult  to  categorize  but 
a  joy  to  read.  Two  such  are  Gavin  Brown's  clear  explanation  of  normal 
numbers  and  dynamical  systems,  and  Don  Newnan's  ihought-prm  oki’.g 
foray  into  those  aspects  of  probability  which  have  a  profound  influence  upon 
our  daily  lives. 

The  cooperation  of  many  individuals  and  organizations  was  required 
in  order  to  make  the  conference  the  success  that  it  was.  The  financial  sponsor.^ 
aie  listed  on  the  'Acknowledgements'  page.  In  addition,  I  wish  to  express 
my  sincere  appreciation  to  my  assistants,  Marcia  and  Jennifer  Byrnes  and 
N’icole  Conte,  for  their  invaluable  aid.  1  am  also  grateful  to  Kathr\-n  Hat 
greaves  and  Karl  Berry,  our  TpXnicians,  for  thesr  supt  ilaiivc  work  on  d’ 
printed  and  emailed  aspects  of  the  conference,  Irom  the  initial  application 
to  this  volume  Their  extraordinary  effort  in  Ti-Xing  tlv.se  proceedings,  . 
suiting  in  one  of  the  few  NATCl  proceedings  'vhere  all  pap'-r^  are  idiv  it,.-, 
typeset,  deserves  special  acclamation  Finally,  m\  lieartfolt  thanks  to  the  H 
Ciocco  staff,  especially  Bruno  Giannasi  and  Alberti'  ''■jttiedini,  for  oilering 
an  ideal  setting,  not  to  mention  the  mtignilicen.  meals,  ihat  promoted  uu 
productive  interaction  between  the  participants  oi  iiu  conierence.  .\il  o;  id', 
above,  the  other  speakers,  and  the  remaining  conlerees  made  it  possible  lo; 
our  Advanced  Study  Institute,  and  this  volume,  tc>  tulhll  the  staff  . i  \  .\“i  ' 
objectives  ot  disseminating  advanced  knowleiis;  r  a’s-  ‘ei  iiig  in  ter- -a  o, 

scientific  contacts. 
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Wavelets  and  fractals 


Wavelets  and  analysis  of  partial  differential  equations 


St^phane  Jaffard 
C.E.R.M.A, 

Ecole  Nationale  des  Fonts  et  Chaussees 
La  Courtine,  93167  Noisy-le-grand  France 

sj@antigone.enpc. fr 

> 

i  We  describe  the  main  properties  of  decompositions  in  orthonormal  ba.ses 
of  wavelets.  We  then  apply  them  to  the  theoretical  and  numerical  study  of 
some  partial  differential  equations. 


1.  Introduction 

In  the  seventies  and  the  eighties,  alternative  methods  to  Fourier  analysis 
appeared  independently  in  many  fields  of  science  and  technology.  Let  us 
mention  oil  detection,  analysis  of  speech,  quantum  mechanics,  image  analy¬ 
sis,  analysis  of  turbulent  flows,  multigrid  methods,  the  theory  of  interpola¬ 
tion  between  functional  spaces,  the  propagation  of  singularities  of  nonlinear 
partial  differential  equations  PDF's,  etc. 

Wavelets  comprise  a  mathematical  tool  which  lies  behind  these  new 
methods.  We  have  two  purposes  in  this  paper.  We  give  a  survey  of  the 
construction  of  wavelets  and  related  orthonormal  base.s,  and  we  also  show 
how  certain  specific  properties  of  wavelets  make  them  an  important  tool 
in  the  theoretical  and  numerical  study  of  PDF's.  We  also  give  at  the  end  a 
large  bibliography. 

2.  Localization  in  the  phase  space 

The  mathematical  evolution  that  led  to  wavelets  and  related  constructions 
can  be  interpreted  as  the  construction  of  successive  bases  of  functions  with 
the  following  aim:  the  decomposition  on  these  bases  yields  the  sharpest 
possible  information  on  the  time  and  frequency  behavior  of  the  analysed 
signal  or  function. 

Such  constructions  are  important  in  signal  analysis  (a  recording  of 
speech  or  music  clearly  contains  localized  parts  which  have  a  specific  fre¬ 
quency),  in  quantum  mechanics  (to  study  probability  waves)  or  in  the  study 
of  partial  differential  operators. 
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The  first  step  was  obtained  by  the  Fourier  series.  The  two  main  draw¬ 
backs  of  Fourier  analysis  are  that  it  is  not  local  and  that  it  is  difficult  to  use 
when  dealing  with  other  spaces  than  L-^  or  the  Sobolev  spaces 

The  problem  of  having  a  stable  decomposition  for  other  spaces  than  L  ’ 
led  Alfred  Haar  to  the  so-called  Haar  system  constructed  as  follows. 

Let  (()  be  the  characteristic  function  of  [0,  l/2Land\i)  =  4)(x)— (h(x— 1/2). 
The  collection  of  all  the  v|>i,k  (i  €  Z,  k  t  Z),  defined  by 

ii)j,u(x)  =  2'  ‘\l)(2'x  -  k) 

forms  an  orthonormal  basis  of  and  the  decomposition  makes  sense 

also  in  the  L’’  spaces.  The  drawback  is  that  the  decomposition  on  this  system 
does  not  give  sharp  frequency  information,  since  the  function  iji  does  not 
have  a  good  frequency  localization.  Wavelets  provide  a  way  to  avoid  this. 

Before  describing  the  constructions  of  wavelet  and  wavelet-type  bases, 
let  us  give  some  general  results  on  "doubly-localized"  orthonormal  bases. 

T.  Steger  proved  [3]  that  L does  not  admit  a  basis  of  the  following  form 

fi(x)  =  e‘^''‘qi(x  -  Qj) 

where  the  Oj  would  be  such  that  sup  ||  gj  ||t<  oo,  for  an  e  >  0,  where 


gj  ii:= 


[]  -t-x^)"' 


g(x)  r  dx 


(1  +L^! 


1 1 . 


I  g( 


dZ. 


The  optimal  result  was  obtained  by].  Bourgain  who  found  a  basis  where  this 
estimate  holds  with  e  =  0  (see  (31). 

Actually,  if  we  accept  to  mix  in  the  same  function  positive  and  negative 
frequencies  of  the  same  value,  this  obstruction  no  more  stands,  and  there 
exists  an  orthonormal  basis  of  of  the  following  form  (see  [9]) 


U>0.n(x)  =  4)(x  -  n) 

il)i.„(x)  =  v/2(t)(x  -  2  lcos(27ilx)  if  I  0, 1  4-  n  t  2Z 

=  y/2<p{x  ~  2  )sin|27Tlx)  if  I  ^  0, 1  +  n  t  2Z  +  I, 

where  (jj  and  ({)  have  exponential  decay. 

The  fact  that  we  do  not  try  to  separate  positive  and  negative  frequencies 
of  the  same  amplitude  means,  in  the  signal  analysis  terminology,  that  we 
study  the  real  signal,  and  not  the  corresponding  analytical  signal. 

Independently,  H.  Malvar  (see  [25])  obtained  a  basis  of  the  following 
similar  form 

Uk.i  =  vv(x  -  l)sin[7r(k  -t-  ]  )(x  -  1)1, 
where  w  is  compactly  supported. 
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R.  Coifman  and  Y.  Meyer  generalized  this  construction  into  the  com¬ 
pletely  adaptative  form  that  follows  (see  [4]) 


U.V.,1 


7i(k.  -1- 


1  ,x-Qi 
2’  L, 


where  Qi  is  an  increasing  sequence  of  real  numbers  such  that  Qi  — >  -i-oowhen 
I  — '  boo  and  at  — >  -oo  when  I  — >  — oo;  Lt  =  at ,  |  —  at  and  u>t  is  compactly 
supported,  essentially  on  the  interval  foi,  at  1 1 1. 

More  precisely,  let  ct  satisfy  at  +  ct  <  at  1 1  —  ct ,  i-  Then  one  chooses 
u'l  such  that 


■  0  <;  wt(x)  s;  1, 

■  Wt(x)  =  I  on  fat  -f-  et.at  1 1  -  Cu  li, 

•  H’tfx)  =  0  outside  fat  -  ct,  at ,  i  +  ct  1 1 1, 

■  if  X  t  fui-  Ct. ai +  eil wt(x)  =  u’i_)(2oi  -x)and  n’^(x) (x)  •  •  I. 

Notice  that  in  order  to  compute  the  coefficients  of  a  function  on  such 
a  basis,  we  need  to  perform  a  pointwise  multiplication,  and  then  to  com¬ 
pute  Fourier  coefficients.  Clearly,  the  whole  decomposition  is  obtained  in 
O  ( N  log  ( N ))  operations. 

We  have  here  a  huge  collection  of  orthonormal  bases,  roughly  speaking, 
as  many  as  the  possible  partitions  of  91  by  segments  of  arbitrary  length.  This 
richness  will  be  used  for  data  compression:  for  a  given  signal,  we  w’ant  to 
determine  a  basis  on  which  the  signal  has  the  "smallest"  decomposition.  For 
that,  we  need  an  algorithm  which  allows  us  to  go  easily  from  one  basis  to 
another.  Let  us  describe  the  following  recipe  due  to  V.  Wickerhauser  ([5]). 

Let  A(  be  the  space  spanned  by  the  (Uk.i  Icez  (which  are  the  functions 
corresponding  to  a  given  window).  The  space  A  =  Ai  L'  Ai ,  i  has  exactly 
the  same  structure  as  a  space  A,„,  with  a  window  between  oi  and  ai ,  2,  and 
a  function  w  which  is 


w(t)  =  \/wf  (x)  -(-  wf ,  ,(x). 

Hence,  we  can  replace  two  adjacent  windows  by  a  larger  one  without  chang¬ 
ing  anything  else. 

The  algorithm  of  representation  of  a  function  is  the  following:  for  a 
given  f,  we  start  by  its  decomposition  using  small  windows  all  of  the  same 
width,  and  we  merge  two  such  windows,  when  there  is  an  advantage  in 
doing  so.  We  iterate  this  procedure  as  long  as  needed,  and  obtain  at  the  end 
a  segmentation  adapted  to  the  signal.  We  still  have  to  choose  a  criterion  for 
deciding  when  we  merge  two  windows  together.  The  one  chosen  is  given 
by  a  kind  of  "entropy  minimization". 
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Suppose  we  have  a  familly  of  orthonormal  bases  e|“*(t).  For  each  cx 
the  signal  f  is  decomposed  on  the  corresponding  orthonormal  basis  by 

f(t)  =  ^c|“'e'“'(t) 
i 

and  we  want  to  minimize  the  entropy 

E'“’  =-2l|c|“*|^og|c|“’|. 
i 

At  each  step,  we  calculate  the  entropy  for  the  two  windows  and  for  the  large 
one,  and  we  merge  the  windows  if  the  corresponding  entropy  decreases. 

3.  Construction  of  wavelets  and  wavelet  packets 

3.1.  Multiresolution  analysis 

We  shall  now  describe  another  collection  of  bases,  which  are  a  generalization 
of  the  classical  wavelets,  and  will  also  supply  a  family  of  orthonormal  bases 
for  which  the  same  entropy  minimization  algorithms  are  used.  But  let  us 
first  recall  the  construction  of  wavelets  by  a  multiresolution  analysis  and  the 
fast  wavelet  decomposition,  both  introduced  by  S.  Mallat  (see  [24]).  We  shall 
stick  to  the  dimension  1  for  the  sake  of  simplicity.  A  multiresolution  analysis 
is  an  increasing  sequence  (Vjli^z  of  closed  subspaces  of  1.-^  such  that 

•  nvj  =  !oi 

■  U  is  dense  in 

-  f(x)€  Vi^f(2x)€  VjM 

•  f(x)  €  Vo  f(x  +  1)  6  Vo 

■  There  is  a  function  g  in  Vo  such  that  the  g(x  -  k)kgz  form  a  Riesz  basis 
of  Vp. 

We  also  require  g  to  be  smooth  and  well  localized. 

A  simple  example  of  multiresolution  analysis  is  obtained  by  taking  for 
Vj  the  space  of  continuous  and  piecewise  linear  functions  on  the  intervals 
[k2~’,  (k  +  1)2“’l,  j,k  e  Z.  A  possible  choice  for  g  is  the  "hat"  function, 
which  is  the  function  of  Vo  taking  the  value  1  for  x  =  0  and  vanishing  at  the 
other  integers.  It  is  easy  to  orthonormalise  the  set  g(x  -  k)  by  choosing 

4)(E)  =  g(E]  (21  lg(^- + 

Then,  the  (t)(x  -  k)  form  an  orthonormal  basis  of  Vq. 

Define  Wj  as  the  orthogonal  complement  of  Vj  in  Vj^  i.  One  imme¬ 
diately  checks  that  the  Wj  are  mutually  orthogonal,  and  their  direct  sum  is 
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equal  to  L^.  By  a  similar  procedure  which  led  to  the  construction  of  ({J/  we 
can  obtain  a  function  \i)  such  that  the  \|)(x  —  k)  form  an  orthonormal  basis  of 
Wo-  Since  the  Wj  are  obtained  from  each  other  by  dilation,  and  are  mutually 
orthogonal,  the  functions  2>''^rl>{2’x-k)  form  an  orthonormal  basis  of 
Let  us  now  come  back  to  the  orthogonal  decomposition 

Vo  =  V-i0W_,.  (3.1) 

We  have  two  orthonormal  bases  of  Vj:  the  first  one  is  the  \/24)(2x  — k),  k  t  Z, 
and  the  second  one  is  the  union  of  the  (t)(x  —  k)  and  il)(x  —  k). 

The  existence  of  two  bases  implies  the  existence  of  an  isometry  mapping 
the  coordinates  in  the  first  basis  on  the  coordinates  in  the  second  basis.  Let 
us  describe  more  precisely  this  isometry.  Let  hu  and  gk  be 

hk  =  ^  ct)(x/2)(^(x  -  k)dx 

Qk  =  ^  4'(x/2)4>(x- k)dx. 

Let  f  €  Vo  and  let  us  write  its  decomposition  on  the  two  bases  of  Vo  given 
by  (3.1). 

f(x)  =  ^c^(t)(x  -  k) 

so  that 

k 

k 

and,  similarly 

=  75  X.  ^k9k-2l 
k 

Suppose  now  that  a  signal  is  given  by  a  sequence  of  discrete  values  c^.  We  can 
consider  that  it  is  the  coefficients  of  a  function  of  Vo  on  the  4)  ( x  -  k) .  The  isom¬ 
etry  transforming  the  sequence  c°  into  (c^.d^)  can  be  written  F  =  (Fo.Fi ) 
where  Fo  and  Fj  are  commuting  with  even  translations:  they  are  discrete 
convolutions  where  we  only  keep  each  other  term.  In  the  terminology  of 
signal  analysis,  Fo  and  Fi  are  said  to  he  quadrature  mirror  filters.  This  notion 


has  been  introduced  in  1977  by  D.  Esteban  and  C.  Galand  for  improving  the 
quality  of  digital  transmission  of  sound.  Iterating  |  j  |  -1  times  the  filter 
and  then  applying  once  the  filter  Fi,  we  obtain  the  coefficients  on  the 
for  i  $  0.  Each  level  requires  only  a  discrete  convolution.  This  algorithm 
constitutes  the  fast  wavelet  transform  (see  [24]).  Of  course  the  decision  to 
apply  Ffi  n  times  and  Fj  once  is  rather  arbitrary,  and  we  could  decide  to  ran¬ 
domly  apply  Fo  and  Fi  a  certain  number  of  times.  This  idea  leads  to  wavelet 
packets  which  are  a  family  of  orthonormal  bases  of  corresponding  to  all 
the  "admissible"  ways  of  applying  these  filters. 

Let  us  come  back  to  our  initial  problem  of  finding  bases  well  localized 
in  phase  space.  The  adaptative  Fourier  windows  and  the  wavelet  packets  do 
not  give  a  satisfactory  answer  to  this  problem,  it  is  therefore  remarkable  that, 
though  they  do  not  have  this  type  of  localization,  they  provide  very  efficient 
data  compression  algorithms.  But  the  problem  of  finding  an  adaptative 
algorithm  which  gives  good  localization  in  phase  space  is  still  open  (by 
good,  we  mean  comparable  to  the  localization  of  any  of  the  commonly 
used  wavelet  bases).  Because  of  this  lack  of  localization,  up  to  now  only 
the  "classical  wavelets"  have  been  used  to  study  operators  and  PDF's,  and 
therefore,  only  these  bases  will  appear  in  the  following. 

4.  Analysis  of  partial  differential  equations 

4.1.  Wavelet  method  for  elliptic  problems 

Consider  an  elliptic  problem,  such  as  the  Poisson  problem,  on  a  bounded 
domain.  Let  us  first  recall  some  properties  of  its  resolution  by  Galerkin 
methods  based  on  finite  elements  or  of  finite  differences. 

One  of  the  main  difficulties  in  these  methods  is  that,  once  the  prob¬ 
lem  has  been  properly  discretized,  one  has  to  solve  a  system  which  is  ill- 
conditioned.  Typically,  for  a  second  order  elliptic  problem  in  two  dimen¬ 
sions,  one  obtains  a  matrix  M  such  that 

X  =  IIMII  IIM-'li  -0(I/h^) 

where  h  is  the  size  of  the  discretization  (see  [28]).  Such  ill-conditioning  has 
two  drawbacks;  it  leads  to  numerical  instabilities  and  to  slow  convergence  for 
iterative  resolution  algorithms.  In  order  to  avoid  this  problem,  one  usually 
uses  a  preconditioning,  which  amounts  to  finding  an  easily  invertible  matrix 
D  such  that  (or  D“'M,  depending  on  the  method  used)  will 

have  a  better  condition  number  k.  For  the  example  we  considered,  the 
usual  preconditioning  methods  on  general  domains  (SSOR  or  DKR  on  a 
conjugate  gradient  method,  for  instance)  make  k  become  0(1 /h).  We  shall 
give  a  wavelet  method  for  which  k  =  0(1)  (see[13]).  This  result  requires 
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the  construction  of  wavelets  adapted  to  the  domain  O  (see  [17]);  they  are  an 
orthonormal  basis  of  L^(D)  composed  of  functions  ijij.i.  (j  5;  0)  such  that 

I  a"-a)j,k(x)  K  C2’“2">'^exp(-Y2’  |  x-lc2'‘  |) 
for  [  a  1^  2Tn  -  2,  and  a  positive  y. 

The  decay  estimates  show  that  rjjj  k  and  its  partial  derivatives  are  es¬ 
sentially  centered  around  k2“’  with  a  width  2~'.  In  the  following,  wavelets 
will  be  indexed  by  A  =  k2“’. 

Actually,  though  these  wavelets  are  not  the  same  as  in  the  case  D  =  (H", 
they  are  "almost"  the  same;  that  is,  numerically,  only  the  wavelets  that  are 
close  to  the  boundary  are  modified.  Thus,  we  can  essentially  keep  the  fast 
decomposition  algorithms,  with  only  small  modifications  near  the  boundary. 

Let  us  now  describe  the  method  of  resolution  for  the  Poisson  equation. 
It  is  performed  by  a  standard  Galerkin  method,  keeping  all  the  wavelets 
up  to  a  frequency  lo-  If  we  solve  a  Laplacian  on  a  domain  with  Dirichlet 
boundary  conditions,  we  have  to  invert  a  matrix 

(Ma.v)  =  I 

We  now  renormalize  the  wavelets  for  the  Sobolev  H '  norm,  that  is  we 
consider  that  the  functions  on  which  the  problem  is  discretized  are  the 

4);^  -  2“’\i>^. 

The  condition  number  of  the  corresponding  matrix  is  then  bounded  inde¬ 
pendently  of  the  size  discretization  h  =  2“'''.  Thus,  a  conjugate  gradient 
method  will  converge  in  a  bounded  number  of  steps,  no  matter  how  precise 
we  require  the  solution  to  be.  We  shall  explain  this  result  in  the  next  part 
by  comparing  it  with  multigrid  algorithms.  Let  us  also  mention  that,  if  we 
use  smooth  wavelets,  the  order  of  accuracy  of  the  method  is  extremely  good 
since  it  is  driven  by  the  local  regularity  of  the  problem  (as  opposed  to  spectral 
methods,  for  instance). 

4.2.  Wavelet  and  multigrid  algorithms 

A  conjugate  gradient  method  converges  slowly  when  the  condition  number 
is  large.  Actually,  the  convergence  is  rather  fast  on  the  subspaces  corre¬ 
sponding  to  the  largest  eigenvalues,  but  slow  for  the  small  eigenvalues.  For 
an  elliptic  problem,  small  eigenvalues  are  associated  to  smooth,  slowly  oscil¬ 
lating  functions  (i.e.,  to  wavelets  indexed  by  a  small  j),  and  large  eigenvalues 
to  high  frequency  functions  (i.e.,  to  wavelets  indexed  by  a  large  j). 

Roughly  speaking,  in  a  multigrid  method,  one  starts  by  making  a 
few  steps  of  conjugate  gradient,  until  the  high  frequency  component  of 
the  solution  is  well  approximated;  the  error  is  then  a  comparatively  low 
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frequency  function,  which  can  thus  be  accurately  calculated  on  a  grid  with 
a  double-size  mesh.  The  resolution  on  the  larger  grid  is  performed  again  by 
the  same  method,  and  one  iterates  this  procedure.  The  part  of  the  solution 
which  has  frequencies  around  2'  is  thus  calculated  on  the  grid  of  size  2“'. 
This  is  precisely  what  is  performed  on  the  wavelet  method  we  described. 
The  splitting  on  functions  defined  on  meshes  of  different  sizes  (which  is  done 
in  multigrid  algorithms)  is  also  performed  by  the  wavelet  decomposition. 
The  essential  difference  is  the  following:  it  is  the  decomposition  on  wavelets 
and  the  recomposition  which  is  iterative  (by  the  fast  algorithms  described 
in  section  2.2),  but  the  resolution  is  just  done  once  by  a  conjugate  gradient. 
Actually,  when  the  function  is  written  in  its  wavelet  decomposition 

i  k 

each  block  Cj,ktl>i,k  has  its  frequencies  around  2',  so  that  the  purpose 
of  the  renormalisation  that  we  make  (multiply  the  terms  of  this  block  by 
2“0  is  to  bring  all  the  eigenvalues  ot  the  matrix  M  close  together  so  that  a 
conjugate  gradient  will  converge  fast.  The  multigrid  method  just  works  the 
other  way  round;  the  iterative  decomposition  according  to  the  frequencies 
is  performed  during  the  resolution. 


4.3.  Analysis  of  singular  operators 

Let  us  mention  a  recent  extension  of  the  ideas  developed  in  Section  4.1  to 
obtain  estimates  of  the  Green  function  of  some  singular  elliptic  operators 
(see  [12]).  Consider  the  following  operator 

A(u)  =  -V(aVu)  +  u 


where  the  function  a  is  positive,  smooth,  but  may  vanish.  Suppose  further¬ 
more  that  a  has  a  zero  of  order  larger  than  2  where  it  vanishes.  Then  the 
following  estimates  on  the  Green  function  of  A  and  its  derivatives  hold 


I  a«aPG(x,y)  K 


_ c _ 

I  sup(\/a(x]a(y),|  x  -  y  |^) 


This  is  obtained,  as  in  the  case  of  elliptic  operators,  by  showing  that  A 
and  A~'  are  "almost  diagonal"  in  a  wavelet  basis. 


5.  Nonlinear  evolution  equations 

The  numerical  study  of  nonlinear  evolution  equations  is  a  field  where 
wavelets  should  be  very  useful.  The  solutions  of  these  equations  often 
have  singularities  which  then  propagate  (even  when  the  initial  value  is 
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smooth).  The  local  analysis  of  the  regularity  performed  by  wavelets  and 
their  properties  of  local  approximation  (see  [16,  15])  can  give  ground  to  a 
justification  of  this  hope.  Several  numerical  experiments  have  been  done 
for  the  one-dimensional  Burgers  equation  (see  for  instance  [11,  21,  22]).  A 
recent  extension  to  Korteweg  de  Vries  equation  has  also  been  implemented 
([18]).  Consider  the  following  Burgers  equation  to  which  is  added  a  small 
viscosity  term 


3u 


3'^u 


The  methods  used  all  consist  of  a  finite  difference  discretization  in  time  and 
a  wavelet  decomposition  in  space.  A  possible  scheme  is  the  following 


U,..i  -Un  ^ 

At  "  3x  ^  3x^ 

where  Un  represents  the  solution  at  the  time  nAt.  Here  the  nonlinear  term 
is  treated  explicitely,  but  the  viscosity  term  appears  implicitely.  Knowing 
Un  by  its  wavelets  coefficients,  we  want  to  calculate  Un,  i.  In  order  to 
compute  the  wavelet  coefficients  of  the  nonlinear  term  U„  we  can  either 
compute  the  values  of  Un  and  on  a  regular  grid  (using  the  Fast  Wavelet 
Transform)  or  try  to  obtain  more  or  less  explicit  formulas  for  the  wavelet 
coefficients  of  such  products  (this  last  issue  is  now  studied).  The  choice  of 
an  implicit  algorithm  obliges  us  to  compute  (Id  -  cAt "  ’  of  a  function 
given  by  its  wavelet  decomposition.  This  is  performed  by  computing  once 
for  all 


=  Sj.i.. 


Since  this  computation  is  rather  costly,  it  doesn't  allow  for  changing  the  time 
scale  At  during  the  calculation.  An  explicit  scheme  using  different  time 
scales  is  being  studied  by  Bacry,  Mallat  and  Papanicolaou  at  the  Courant 
Institute.  These  methods  give  the  solution  with  a  very  good  accuracy,  espe¬ 
cially  near  the  shock  where  the  oscillations  are  small  and  very  well  localized. 
Adaptative  schemes  with  a  local  refinement  around  the  shock  are  studied. 
The  tracking  of  the  shock  is  very  easy  since  it  takes  place  where  the  large 
wavelet  coefficients  are. 
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t  A  nvn  e/et  basis  is  an  orthonormal  basis  tor  L'  'ih  ,  the  space  ot  square 
integrable  functions  on  the  real  line,  of  the  form  'g,.  ^  ..  where  g-  ^  t 
2'  ■  g  2' t  k  and  g  isa  single  fixed  function,  the  wavelet  bach  miilfires.  > 
hition  ana/vsis  for  I  ‘  '2t  determines  such  a  basis.  lo  find  a  miiUireM'lulion 
analysis,  one  can  begin  with  a  diiatfori  t  t  i  '  2t  k  It 

the  solution  f  (the  sva/ing  tiinctkm)  satisfiv*s  certain  reejuirements.  then  a 
multiresolution  analysis  and  hence  a  wacelet  basis  will  follow  ITu''  pa 
per  surveys  methods  of  achieving  this  goal  Two  separate  problems  ,ire 
involved:  first,  solving  a  general  dilation  ev|iiation  U'  lind  a  siahng  liiiu- 
tion,  and  second,  determining  when  such  a  scaling  function  will  generate 
a  multiresolution  analysis.  VVe  present  two  methods  lor  Milving  dilation 
equations,  one  basi\l  on  the  use  of  the  |-ourier  transform  and  one  operating 
in  the  time  domain  utili/ing  linear  algebra  The  s*\ond  method  character 
izes  all  cvintinvious,  integrable  scaling  tunctions  VVe  also  preM'nt  methods 
of  determining  when  a  multiresolution  analv-is  will  follow  troin  the  svahng 
function.  VVe  discuss  simple  conditions  oi,  the  ccH-tficients  v.  which  are 
"almost"  sufficient  to  ensure  the  existence  ot  a  wavelet  basis,  in  p.irticular, 
they  do  ensure  that  g.o,  i,.:  isa  t/glit  tranie.  and  we  present  moreioni 
plicated  ntxe.ss<irv  and  sufficient  conditions  for  the  generation  ofa  iniillires 
olution  analysis  Tlie  results  presente'd  are  due  mainly  lo  Cohen  (.  olell.i. 
Daubechies,  Heil,  lagarias,  Uivvton,  Mallat.and  Mever,  although  several  ol 
the  results  have  been  independently  investigated  by  other  groups,  im  hid 
ing  Berger,  Cavaretta,  Dahmen,  Deslaiiriers.  IViibuc,  Dvn  l  iiola.  C.regorv, 
l  evin,  Micchelli,  Prautzsch,  and  Wang 

I.  Introduction 

The  Haar  system  is  the  classical  example  of  an  affine,  or  wavelet,  orlhonor- 
mal  basis  for  the  space  L  (fH )  of  square-integrable  functions  on  the  real  line. 

t  We  thank  David  Colclla  of  The  MITRF  Corporation,  Md,ean,  Virginia,  for  his 
collaboration  on  '■ome  of  the  results  reported  in  this  paper,  and  for  his  review  of  this 
document. 
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It  consists  of  a  set  of  translations  and  dilations  of  a  single  function,  ihr  /.i.ir 
vvave/ef  U)(t)  —  Xjo,\  2  “  ^li  i.o.  where  Xi  is  the  characteristic  tun^  tion 
of  the  set  E.  Precisely,  the  Haar  system  has  the  form 
\i'„klt)  2"  ‘\|'(2'’t  k).  Such  a  simply-generated  orthononnal  basis  is 

verv  appealing;  however,  the  fact  that  the  Elaar  wavelet  is  discontimious 
severely  limits  the  usefulness  of  the  Haar  system  in  applications.  Recer.Uv, 
examples  of  other,  smooth  wavelets  which  generate  affine  orthonormai '  .isi’s 
have  been  given,  the  first  bv  .Meyer  (281.  .Vleyer's  example  is  an  iniu  ,i!v 
differentiable  function  which  has  a  compactly  supported  Fourier  tran.slorm 
Additional  examples  have  been  given  by  Lemarie  [26]  and  Battle  [  1 1  (1-111  ivs 
differentiable  with  exponential  decay),  Daubechies  [12]  (k-times  lii  teien 
tiable  with  compact  support),  and  others.  Such  smooth  wavelets  arc  bciicr 
suited  to  applications  than  the  Haar  wavelet;  for  example,  thev  have  hci  n 
used  in  speech  compression  [8,7). 

Soon  after  Meyer's  initial  example,  Mallat  and  Mever  p' oved  Lhai  »  ’c!i 
miiltiresolution  ,}ihilysis  for  L‘(fH!  determines  a  wavelet  basis  (27;  '  ■  '■ 

of  the  wavelet  bases  mentioned  above  are  determined  bv  an  appi  ipriatc 
multiresolution  inalysis  (although  not  all  wavelet  bases  arc  .issocia'  ■  '  w  !.h 
multiresolution  analyses)  A  multiresolution  analysis  i  \  :  is  deb  c  -s 

a  sequence  of  subspaces  Vh,  -  I'f  such  that 

1 )  Vn  2  V„ ,  I  tvir  all  n, 

2)  :V,.  ,0  , 

."i)  .'V„  is  dense  in  I  ‘ifB ),  and 

4)  h(tl  •  V,:  h(2t)  -  V„ ,  ,, 

together  with  a  function  f  >-  Vc  such  that  the  oilection  o*  mtevi -•  ,  i" 

lates  of  fit  k'  i.  Firms  an  orthonormal  basis  lor  V,-  v  cm  ..  v..  ■  1 
multiresolution  analvsii  we  have  f  •'  VV  ’  V;.  .As  2'  'f!2i  k  '  .  .  i 
tirthonormal  basis  f(>r  V  j,  there  must  therefore  exist  .scalars  Ci,  sUv.  .i,u 

till  ^  Cc  tiit  cl.  i  ■  ' 

c-  c, 

71ns  is  referred  to  as  die  (ituiuced)  dilation  equation,  and  .ts  suiit:  n 
the  ^calin^  tiinclion  It  can  be  proved  that  it  we  define  the  u  aie/et  , 

ii  1 1 )  1  1 1 '  c \  k  ’ '  2 1  k  I 

k.  tr  C 

(where  N  is  as  defined  below),  then  q  will  generate  an  orthonormal  basis 
L‘(fB)  of  the  form  ,g„k  !n.cfC,,c'f.  Section  4.  From  1 1 .2),  it  tollows  that  proper 
ties  of  the  wavelet  g  such  as  continuity,  differentiability,  etc.,  are  determined 
by  the  corresponding  properties  of  the  scaling  function  f 
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Remark  1.1.  For  the  Haar  system, 

Vo  ;h  :  h  is  constant  on  each  intervalfk,  k.  -t  1  )|, 

the  induced  dilation  equation  is  f(t)  =  f(2t)  +  f(2t  -  1)  (i.e.,  co  =  ci  ~  1 
and  all  other  ci^  ~  0),  the  scaling  function  is  f  =  X|o  | ,,  and  the  wavelet  is  the 
Haar  wavelet  q(t)  =  i|,’(t)  ^  f(2t)  —  f(2t  -  1 ). 

To  tind  wavelet  bases  for  it  suffices  to  construct  multiresolution 

analyses.  One  method  of  achieving  this  is  the  following.  Choose  a  set  of 
coefficients  ;ci,;  and  solve  the  corresponding  dilation  equation  (1.1 1  for  the 
scaling  function  f.  If  f  is  orthogonal  to  each  of  its  integer  translates  then 
define  Vo  to  be  the  span  of  the  integer  translates  of  f  and  define  V„  for  n  :  2 
as  the  appropriate  dilation  of  Vo  (i.e.,  V'„  ^  spanjfnkiccc)  h  V„  ^  0 

and  if  V,,  is  dense  in  then  |  V,/,  f)  is  a  multiresolution  analvsis,  and 

therefore  the  wavelet  g  defined  by  1 1 .21  will  generate  an  affine  orthonormal 
basis  for  1  -’(gt).  if  this  is  the  case  then  vve  sav  that  the  coefficierits  Ci,  haw 
determiocd  the  multiresolution  analysis  FV'„i,  fl. 

There  are  obviously  two  separate  difficulties  in  this  approach,  nameh  , 

0  solving  a  given  dilation  equation  to  find  a  scaling  function,  and 
2)  determining  conditions  under  which  a  multire.solution  analvsis  will 
follow  from  such  a  scaling  function,  i.e.,  conditions  under  which  f  will 
be  orthogonal  to  its  integer  translate.s,  etc. 

VVe  survev  results  on  these  two  problems  in  this  paper.  A  shorter  sur\e\, 
which  also  includes  a  discussion  of  the  application  of  wavelets  to  fast  signal 
processing  algorithms,  is  (3,31. 

The  first  problem,  that  of  solving  a  general  dilation  ecjuation,  is  not 
restricted  in  application  to  wavelet  theory.  In  particular,  dilation  equations 
play  a  role  in  spline  Iheorv,  interpolation  and  subdivision  methods,  and 
?mooth  curve  generation  [2,  4,  5,  17,  20,  19,  2^,  ,30).  .Although  we  locus 
in  this  paper  on  results  by  p, roups  involved  in  wavelet  research  (including 
Cohen,  Colella,  Daubechies,  Heil,  Lagarias,  Lawton,  Mallat,  and  VIever), 
many  of  the  same  or  related  results  have  been  indepemlently  vibtained  bv 
groups  involved  in  these  .ither  areas  (including  Berger,  Cavaretta,  Dah.inen, 
Deslauriers,  Dubuc,  Dyn,  Hirola,  Gregory,  Levin,  Micchelli,  Prautzsch,  and 
Wang).  In  some  cases,  results  by  these  other  groups  were  obtained  earlier  or 
are  more  complete  than  the  ones  we  discuss. 

In  Sections  2  and  3  we  consider  two  methods  of  solving  general  dilation 
equations.  The  methods  in  Section  2  are  based  on  the  use  of  the  Fourier 
transform.  We  prove  results  due  to  Daubechies  and  Lagarias  showing  that 
every  dilation  equation  has  .i  solution  in  the  sense  of  distributions  ..nd 
that  integrable  solutions,  if  they  exist,  are  unique  up  to  multiplication  by  a 
constant.  We  then  present  results  of  Daubechies  and  Mallat  which  show 


when  integrable  solutions  to  dilation  equations  will  exist,  and  results  of 
Colella  and  Heil  showing  when  they  will  not  (these  results  do  not  completely 
characterize  those  dilation  equations  which  have  integrable  solutions). 

In  Section  3  we  present  a  time-domain  based  method  for  solving  certain 
dilation  equations,  due  to  Daubechies  and  Lagarias,  which  utilizes  linear 
algebra.  This  method  produces  continuous,  integrable  scaling  functions  if 
appropriate  conditions  hold.  Colella  and  Heil  have  proved  that  this  method 
characterizes  tho.se  dilation  equations  which  have  continuous,  integrable 
solutions. 

In  Section  4  we  consider  the  second  problem.  We  show  that  if  the 
coefficients  (Ckl  determine  a  multiresolution  analysis  then  necessarily 

^  Cik  =  ^  C2k  n  =  1  (1 .3) 

c  k 


and 


^CkCk.2i  =  2  6oi  for  every  Its.,  (1.4) 

k 

where  bi\  is  the  Kronecker  delta,  i.e.,  6ij  =  1  if  i  =  i  and  0  otherwise.  We  then 
prove  a  result  due  to  Lawton  which  shows  that  (1.3)  and  ( 1 .4 )  are  "almost" 
sufficient  to  generate  a  wavelet  orthonormal  basis,  In  particular,  Lawton 
has  proved  that  if  (1.3)and  (1.4)  are  satisfied  then  lg„khi.ksi  will  be  a  tight 
frame,  i.e.,  the  reconstruction  property 

h  ==  forall  h  t  (1.5) 

11. k 

will  be  satisfied,  although  (Onkln.kcc.  need  not  be  an  orthogonal  set.  (The 
general  theory  of  frames  was  developed  by  Duffin  and  Schaeffer  in  [18] 
in  connection  with  nonharmonic  Fourier  series.  The  connection  between 
frames  and  wavelet  theory  is  surveyed  in  [23],  and  researched  in  depth  in 
[13].)  We  also  discuss  more  complicated  conditions,  independently  derived 
by  Lawton  and  Cohen,  which  are  both  necessary  and  sufficient  to  ensure 
that  a  multiresolution  analysis,  and  therefore  a  wavelet  orthonormal  basis, 
is  generated.  Lawton  has  proved  that  almost  all  choices  of  coefficients  (ck! 
which  satisfy  (1.3)  and  (1.4)  also  satisfy  these  conditions  for  orthogonality. 

For  simplicity  of  presentation,  we  assume  throughout  this  paper  that 

coefficients  (ck  I  are  given  which  are  real  with  only  co . Cn  nonzero,  i.e.,  we 

consider  only  Daubechies-type  wavelets).  In  Sections  2  and  3,  we  assume  in 
addition  that  (1.3)  is  satisfied.  These  conditions  are  not  necessary  for  many 
of  the  proofs,  and  many  of  the  results  in  which  they  are  necessary  can  be 
modified  for  more  general  situations.  The  fact  that  the  coefficients  [Ckl  are 
real  implies  that  the  .scaling  function  f  will  be  real-valued. 
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Given  these  restrictions,  the  Haar  system  is  of  course  the  only  example 
with  NI  =  1.  It  can  be  shown  that  multiresolution  analyses  can  be  produced 
only  when  N  is  odd.  We  will  use  the  case  NI  =  3  to  illustrate  many  of 
the  results  in  this  paper.  For  this  case,  assumption  (1.3)  reduces  to  the 
statement  Cc  +  C2  =  ci  +  C3  =  1,  i.e.,  the  collection  of  four-coefficient  dilation 
equations  with  the  given  restrictions  is  a  two-parameter  family.  We  select  the 
independent  parameters  to  be  Co  and  C3,  and  represent  this  collection  of  four- 
coefficient  dilation  equations  as  the  (co,C3)-plane.  Figure  1.1  shows  several 
geometrical  objects  in  the  (co,C3)-plane.  The  following  results  regarding 
these  geometrical  objects  are  discussed  in  this  paper. 

1)  There  are  no  integrable  solutions  to  dilation  equations  corresponding 
to  points  on  or  outside  the  ellipse,  with  the  single  exception  of  the  point 
(1,1). 

2)  There  do  exist  integrable  solutions  to  dilation  equations  corresponding 
to  points  on  and  inside  the  circle,  and  inside  the  shaded  region. 

3)  There  are  continuous,  integrable  solutions  to  dilation  equations  in  a 
large  portion  of  the  triangle,  and  no  continuous,  integrable  solutions 
outside  the  triangle. 

4)  There  are  differentiable,  integrable  solutions  to  dilation  equations  on 
the  solid  portion  of  the  dashed  line. 

5)  Each  point  on  the  circle,  with  the  single  exception  of  the  point  (1,1), 
determines  a  multiresolution  analysis  and  therefore  a  wavelet  basis  for 

We  refer  to  this  circle  as  the  circle  of  orthogonality. 

Throughout  this  paper,  L'’('31)  will  denote  the  Lebesgue  space  of  p- 
integrable  functions  on  the  real  line,  with  norm  HfH,,  =  (J  it(tll'’  dt)'  ''  for 
1  ^  P  <  00  and  l|fl|cc  =  esssup  |f(t)|.  The  inner  product  of  functions  f,  g 
is  (f,g)  -  J  f(t)  g(t)  dt.  The  Fourier  transform  of  an  integrable  function  t 
is  f(Y)  =  J  f(t)  €'■''*  dt.  Integrals  with  unspecified  limits  are  over  the  entire 
real  line. 

2.  Fourier  methods 

By  considering  the  Fourier  transform  of  the  dilation  equation,  we  can  prove 
that  every  dilation  equation  has  a  solution  in  the  sense  of  distributions. 
Consideration  of  the  smoothness  and  decay  of  the  Fourier  transforms  of 
these  distributions  can  indicate  whether  or  not  these  distributions  are  given 
by  functions  on  the  real  line.  We  assume  throughout  this  section  that  (1.3) 
is  satisfied. 

Some  notation  is  required  to  adequately  describe  distributions.  We  let 
S(93)  denote  the  Schwartz  space  of  infinitely  differentiable,  rapidly  decreas¬ 
ing  functions  on  the  real  line,  and  let  S' (53)  denote  its  topological  dual,  the 
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space  of  tempered  distributions.  For  functions  ip  we  define  the  tr,msIatioii 
operator  T„<p(t)  =  4)(t  -  a)  and  the  dilation  operator  -  iplat). 

Translation  and  dilation  of  a  distribution  v  c  is  defined  bv  duality,  i  e  , 
(Tci'v.ip)  -  (■v'.T-<i<p)  and  'D,.v,ip)  -  c  '(-v'.Dj,  iip).  With  this  notation, 
the  dilation  equation  (1.1)  has  the  form  t  =  ^  Ci,  Tv,f.  1  i.e; eforc,  we  say 
that  -v  6  S'(fH)  is  a  scaling  distribution  if 

■v  ^Cv,  D2Tk^', 
k 

i.e.,  if  ("v.cp)  ^C),  (Dif  k'v.tp)  for  all  (p  t  SlfH).  By  taking  Fourier  trans¬ 

forms,  we  therefore  have  that  v  is  a  scaling  distribution  if  and  only  if 


D2"v  =  mo'v, 
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where  ttioIy)  =  (1/2)  ^  If  it  is  the  case  that -v  is  a  function  on  3?  then 

this  is  equivalent  to 

-v(2y)  =  molYl-vly)  fora.e.  Ye3?.  (2.1) 

Assume  now  that  -v  is  a  continuous  function  on  3?.  Then  we 
can  iterate  (2.1),  obtaining  (formally)  -(-(y)  =  'v(y/2")  ]/]"mc(Y/2M 
■v(O)  nr  ^e(v/2’)-  Daubechies  established  that  this  infinite  product 
converges,  and  proved  with  Lagarias  the  following  result,  cf.  [12, 15]. 

Theorem  2.1. 

1)  p(Y)  =  nr  mo(Y/2’)  converges  uniformly  on  compact  sets  to  a  con¬ 
tinuous  function  which  has  polynomial  growth  at  infinity. 

2)  Define  f  to  be  the  tempered  distribution  such  that  f  =  P.  Then  f(2Y)  = 
ttio(y)  ((y)  for  all  Y/  so  f  is  a  scaling  distribution.  The  support  of  f  is 
contained  in  fO,  N]. 

3)  If  V  is  another  scaling  distribution  such  that  -v  is  a  function  on  3?  which 
is  continuous  at  zero  then  -v  =  v(0)  f. 

4)  If  a  nonzero  integrable  solution  to  ( 1 .1 )  exists  then  it  is  t,  up  to  multi¬ 
plication  by  a  constant,  and  J  f(t)  dt  =  1. 

We  call  the  distribution  f  defined  in  Theorem  2.1  the  canonical  scaling 
distribution.  Other  solutions  to  the  dilation  equation  are  given  in  [15],  and 
certain  classes  of  solutions  are  characterized  in  [11]. 

The  proof  of  Theorem  2.1  requires  only  that  ^  c\  =  2;  if  this  is  not  the 
case  then  a  canonical  solution  of  the  dilation  equation  can  still  be  defined, 
but  the  uniqueness  results  of  Theorem  2.1  will  not  hold.  Even  with  the 
assumption  Cv,  =2,  uniqueness  in  function  spaces  other  than  L’  (3?)  may 
not  hold.  For  example,  the  Hilbert  transform  Hv  of  any  solution  v  of  a 
dilation  equation  is  also  a  solution  of  the  same  dilation  equation.  Since  H 
maps  L'M'YI)  into  for  1  <  p  <  oo,  uniqueness  cannot  hold  in  any  of 

these  spaces.  Additional  uniqueness  criteria  and  methods  of  generating  new 
solutions  to  dilation  equations  from  known  solutions  are  given  in  [11]. 

Existence  of  an  integrable  solution  of  a  dilation  equation  is  not  guaran¬ 
teed.  The  following,  from  [1 1  ],  is  an  easily  checkable  necessary  condition  for 
the  existence  of  such  solutions,  based  on  the  fact  that  the  Fourier  transform 
of  an  integrable  solution  must  decay  at  infinity. 

Theorem  2.2.  Given  x  6  [0,27t).  Assume  that  the  set 
[x  mod  2ti,2x  mod  27i, . . .  ,2"“'x  mod  27iJ 
is  invariant  mod  27;  under  multiplication  by  2.  If 

n  "  I 

|mo(2’x)|  5  1  and  mo(2”’x)/0  for  all  j  )>  1 
i  1 


then  the  canonical  scaling  distribution  is  not  an  integrable  function,  and 
therefore  there  do  not  exist  any  integrable  solutions  to  (1.1). 


Remark  2.3.  Consider  the  case  N  =  3.  The  set  {27T/3,47t/3}  is  invariant 
mod  In  under  multiplication  by  2,  and  lmo(2Tt/3)  Tno(47t/3)l  ^  1  for  all 
(co.ca)  on  and  outside  the  ellipse  shown  in  Figure  1.1.  The  additional 
hypotheses  of  Theorem  2.2  are  also  satisfied  for  all  but  countably  many  of 
these  points,  and  therefore  for  almost  no  point  on  or  outside  the  ellipse  can 
an  integrable  solution  to  the  corresponding  dilation  equation  exist.  All  but 
one  of  the  countably  many  remaining  points  are  also  eliminated  when  the 
3-cycle  [ln/7 ,An/7 ,9>n/7}  is  checked  in  addition  [11].  The  remaining  single 
point  is  (1,1);  the  integrable  solution  to  the  dilation  equation  corresponding 
to  this  point  is  f  =  ( 1  /3)X(o  3,. 

Theorem  2.2  deals  with  non-existence  of  integrable  solutions  by  estab¬ 
lishing  conditions  under  which  the  Fourier  transform  P  =  f  of  the  canonical 
scaling  distribution  f  will  not  decay  at  infinity.  Alternatively,  by  imposing 
sufficient  decay  on  f  we  can  obtain  f  e  L^(iH),  and  therefore  f  €  L’  (3?) 
since  f  has  compact  support.  This  is  made  precise  in  the  next  theorem,  due 
to  Daubechies  (12).  The  notation  used  in  the  theorem  is  as  follows.  Since 
2mo(7i)  =  22(-l )^Ck  =  C2k  -  ^  C2k M  =0,  we  can  factor  a  term  of  the 
form  1  -I-  c'"*'  from  mo(Y).  If  the  zero  at  n  has  multiplicity  at  least  L  then 
Tno(Y)  =  (('  +  e'''')/2)’- Q(y),  and  therefore 

f(Y)  =n^o(Y/2*)  =  nQ(Y/2M. 

i  1  ^  i  .  1 


Theorem  2.4. 

1)  If  llQIloo  <  then  the  canonical  scaling  distribution  f  is  an  inte¬ 

grable  function. 

2)  If  llQIloo  <  2'-~'  then  the  canonical  scaling  distribution  f  is  a  continu¬ 
ous,  integrable  function. 


Proof.  We  prove  only  the  first  statement. 

Set  M(y)  =  nr  Q(y/2’);  this  is  a  continuous  function.  Define  R  = 
l|M  •X[_,  then  since  M(2y)  =  Q(y)M(y)  we  have  ||M  ■X|_2.>,2'.|||oc  $ 
IIQIISj  R-  whence  |M(y)I  $  C  IyI''’*'  for  some  constant  C.  Therefore, 
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where  p  =  2L  -  1  -  21og2  HQHoo  and  C'  is  another  constant.  Since  p  >  0, 
((sin  y/2)/(y/2))’ ' is  integrable,  and  therefore  f  6  L^(iR).  Hence  f  t 
(iH),  and  therefore  f  is  integrable  since  it  has  compact  support.  I 


Remark  2.5.  For  N  =  3,  the  multiplicity  L  is  one  except  for  those  points  on 
the  dashed  line  shown  in  Figure  1.1;  for  those  points,  L  =  2. 

The  region  of  points  ( Co ,  c.^ )  for  which  the  hypotheses  of  the  first  part  of 
Theorem  2.4  is  satisfied  with  L  =  1  is  the  shaded  region  shown  in  Figure  1.1, 
i.e.,  integrable  scaling  functions  exist  for  all  points  in  this  region  [11]  (see 
also  Remark  2.7  for  an  additional  region). 

No  points  in  the  (cc.c^l-plane  satisfy  the  second  part  of  Theorem  2.4 
with  L  =  I.  ForL  =  2,i.e.,c?  =  l/2-co,Theorem2.4impliesthatcontinuous 
solutions  exist  for  - 1  /4  <  co  <  3/4.  This  result  is  inferior  to  the  one  obtained 
in  Section  3,  where  it  is  shown  that  continuous  scaling  functions  occur  on 
this  line  precisely  when  -1/2  <  Co  <  1,  and  in  fact  are  differentiable  if 
0  <  Co  <  1/2  (i.e.,  on  the  solid  portion  of  the  line  shown  in  Figure  1.1). 
Moreover,  it  is  shown  in  Section  3  that  continuous  scaling  functions  occur 
over  a  large  region  of  the  (co.csl-plane,  including  the  regions  shown  in 
Figures  3.1-3.6. 

Eirola  has  taken  a  different  (but  still  Fourier-based)  approach  in  [21 J. 
He  obtains  conditions  under  which  scaling  functions  will  be  continuous  and 
estimates  for  the  Sovolev  exponent  of  continuity  for  these  scaling  functions. 
In  Section  3  we  discuss  a  time-domain  method  for  obtaining  estimates  for 
the  Holder  exponent  of  continuity  of  scaling  functions. 

We  end  this  section  with  an  adaptation  of  an  existence  result  due  to 
Mallat  [27];  part  of  the  proof  we  give  is  due  to  Lawton  [24] 

Theorem  2.6.  If 

|nic(Y)i^  +  ItuoIy  +  rt)|^  s:  1  for  ally/ 

then  the  canonical  scaling  distribution  is  an  integrable  function. 


Proof.  Set 


n 


Un(Y)  =  X|_2"n,2nn|(Y)  ’ 

i  1 


(2.2) 


By  Theorem  2.1,  u„  converges  uniformly  on  compact  sets  to  the  Fourier 


transform  of  the  canonical  scaling  distribution  f.  Now, 


and,  by  a  similar  argument,  |iui||2  =  27t.  Therefore  [u„!  is  contained  in  the 
ball  in  L^(9^ )  of  radius  \/Tn  and  therefore  has  a  weak*  accumulation  point. 
Since  Un(Y)  ^(y)  pointwise,  this  accumulation  point  must  be  f,  whence 
f  €  L-^  (IH ).  Since  f  has  compact  support,  it  is  therefore  integrable  as  well.  I 


Remark  2.7.  For  N  =  .5,  equation  (2.2)  is  satisfied  for  all  points  (ce.c.O  on 
and  inside  the  circle  shown  in  Figure  1.1.  Therefore,  there  exist  integrable 
solutions  for  all  dilation  equations  corresponding  to  such  points.  By  the 
remark  following  Theorem  2.4,  integrable  solutions  also  exist  for  points  in 
the  shaded  region  in  Figure  1.1.  The  union  of  these  two  regions  does  not 
exhaust  the  set  of  four-coefficient  dilation  equations  which  have  integrable 
solutions,  cf.  [1 1  ]. 


3.  Matrix  methods 

In  [16],  Daubechies  and  l.agarias  proved  sufficient  conditions  under  which 
a  dilation  equation  has  a  continuous,  integrable  solution  (or,  more  gener¬ 
ally,  an  integrable  and  n-times  differentiable  solution).  In  particular,  they 
proved  that  if  the  joint  spectral  radius  p(  lolv.  Tilv)  of  two  N  »  N  matrices 
To,  1 1  (whose  entries  contain  only  the  coefficients  Ickl)  restricted  to  a  certain 
N  1  dimensional  subspace  V  is  less  than  one  then  the  canonical  scaling 
distribution  f  is  a  continuous  and  integrable  function,  and,  moreover,  is 
Holder  continuous  with  Holder  exponent  a  ^  -  log2  p(Tolv.Tilvl.  We  out¬ 
line  this  result  in  this  section.  In  [10],  this  result  is  extended  to  a  necessary 
and  sufficient  condition;  in  particular,  it  is  shown  there  that  the  canonical 
scaling  distribution  f  is  a  continuous  and  integrable  function  if  and  only  if 
p(Tolw.Tilw)  <  1,  where  W  is  an  appropriate  subspace  of  V,  and  that  in 
this  case  a  =  -log2  p(Tolw.  Tilwl-  It  is  conjectured  in  [10]  that  W  =  V  in 
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general,  except  for  a  set  of  coefficients  of  measure  zero,  and  it  is  proved  in  [9] 
that  p(Tolw,Tilw)  =  p(Tolv,Tilv)  for  all  choices  of  coefficients  with  Nl  <  3. 
We  assume  throughout  this  section  that  ( 1 .3)  is  satisfied. 

Given  the  coefficients  |C),!,  define  the  N  x  N  matrices  To  and  Tj  by 


(To)ii  =  Cii-j-i 

and 

(T,)ii 

=  C2i-i.  For  example,  for  N 

=  3  we  have 

/Co 

0 

/ci 

Co  0  \ 

To  =  C2 

Cl 

Co 

and  T|  =  I  c? 

C2  Cl  . 

V  0 

C.i 

C2/ 

0  c,/ 

For  X  t  lO,  1],  X  7^  1/2,  define 


TX  = 


2x,  0  ^  X  <  1  /2, 

2x-I,  l/2<x^l. 


i.e.,  if  X  =  .didid!...  is  the  binary  decimal  expansion  of  x  then  tx  - 

•  did? _  Although  t(1/2)  is  not  uniquely  defined,  this  ambiguity  will  not 

pose  any  problems  in  the  analysis. 

We  say  that  a  function  t  is  Holder  continuous  if  there  exist  constants 
K,  a  such  that  |f(x|  -  f(y)|  $  K(x  -  y!“  for  all  x,  y  ^  :R.  The  largest  such 
exponent  a  is  the  Holder  exponent  and  the  corresponding  smallest  constant 
K  is  the  Holder  constant. 

The  relationship  between  the  dilation  equation  (1.1 1  and  the  matrices 
To,  Ti  is  given  in  the  following  result  from  (16|. 


Proposition  3.1. 

1)  Assume  f  is  a  continuous  and  integrable  scaling  function.  Define  the 
vector-valued  function  \'(xl  for  x  '0,  F  bv 


v(x) 


/  \ 

t(x  -e  It 

\  t(x  4  N  1)/ 


Then  v  is  continuous  on  iO,  I  and  satisfies 
vi(0)  r’N(l|  ^  0, 


(3.11 


Vi ,  i(0|  v'id)  for  i  =  I . N  -  1, 


v(x)  -  Ta,v(Tx)  for  x  .didj . . .  t  (0, 1],  x  f  1/2, 
v(l/21  Tov(1)  =  Tiv(O), 


(3.2) 
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where  Vi(x)  is  the  component  of  v(x).  Moreover,  if  f  is  Holder 
continuous  with  Holder  exponent  a  then  the  same  is  true  of  v. 

2)  Assume  v  is  a  continuous  vector-valued  function  on  [0, 1]  satisfying 
(3.2).  Define  the  function  f  by 


0,  x  ^  Oor  X  ^  N, 

Vi(x),  i  -  1  $  X  ^  i,  i  =  1, . . .  ,N, 


(3.3) 


Then  f  is  a  continuous  and  integrable  scaling  function.  Moreover,  if 
V  is  Holder  continuous  with  Holder  exponent  a  then  the  same  is  true 
of  f. 


The  fundamental  theorem  on  the  existence  of  continuous,  integrable 
scaling  functions  is  the  following  result  from  [16],  The  notation  used  in  the 
theorem  is  as  follows.  Let  V  denote  the  subspace 

V  =  ;ue  :ui  4-...-^  un  =01, 

and  let  Mbe  the  (N  - 1 1  •  (N  - 1 ) matrix  Mi,  =  c>i  j.  A  point  x  =  .di  . . .  d,„  ■; 
(0, 1)  with  a  finite  binary  decimal  expansion  is  called  a  dy<>dicpoint. 

Theorem  3.2.  Fix  any  norm  ji  i;  on  and  assume  there  exist  constanLs 
C  >  0  and  0  <  A  <  I  such  that 


;  ( 1(1,  ■  • '  1(1  ,  l‘v(|  C  A"'  (3.4) 

for  every  choice  of  di _ _  d,„  >r  (0, 1  and  every  m  >  0.  Then  the  following 

statements  are  true. 

1 )  1  is  a  simple  eigenvalue  of  lo,  1 1,  and  M. 

2)  M  has  a  right  eigenvector  (a, . on  i  1^  for  the  eigeiualue  1  such 

that  Qi  +  4  qn  .  1  1 

3)  Set  v(0)  -  (0,  a  I . qm  i  )1  and  define  v|x)  for  x  .di  . .  d,„  -  iO,  L 

by 

'■(<)  -  Li,  ■  ■  l-l-'l) 

Thcnvi(x)  t  •  *  vn|x)  I  for  every  such  x. 

4)  V  is  bounded  on  the  set  of  dyadic  points  in  [0, 1]. 

.3)  \'  is  Holder  continuous  on  the  set  of  dyadic  points  in  (0, 1]  with  Holder 
exponent  a  log^  A,  and  has  a  unique  continuous  extension  to  [0, 1 1 
which  is  Holder  continuous  with  the  same  exponent  a. 

6)  V  satisfies  (3.2),  and  therefore  the  function  f  defined  by  (3.3)  is  a  con¬ 
tinuous,  integrable  scaling  function  and  is  Holder  continuous  with 
exponent  a. 
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Proof.  Full  details  of  the  proof  can  be  found  in  [16];  we  sketch  some  selected 
points  below. 

1)  follows  from  the  fact  that  V  has  dimension  N  —  1  in  and  that  M 
is  a  submatrix  of  both  To  and  T]. 

2)  follows  from  1)  and  the  fact  that  ( 1 , . . . ,  1 )  is  a  left  eigenvector  for  M 
for  the  eigenvalue  1 . 

3)  Since  ( 1 , . . , ,  1 )  is  a  common  left  eigenvector  for  To  and  T  i  for  the 
eigenvalue  1, 

vi(x)  H - +Vn(x)  =  (1 . 1)v(x) 

=  (1....,l)Td,  •T.,,„v(01 

=  (1 . l)v(O) 

=  1. 

5)  Choose  any  dyadic  x  =  di  . . .  du  6  [0, 1]  and  assume  y  >  x  is  also 
dyadic.  If  2“”'“'  $  y  -  x  <  2~"’  with  m  >  k  then  x  =  .di  . . .  d,,,  and 
y  =  .di  . . .  dmdm  It  •  • .  dm  li  for  some  j.  From  3),  vlx'^y )  -  v(0)  t  V,so 

l|v(y|-v(x)||  =  !|Td,...Td.„(v(T"'y)-v(0))|| 

^  ll(T<i,  •  ■■T<t,„)lv!!  l|v(T'''y)-v(0)|| 

$  2LC^’'' 

=  2LCA- ' 

$  2LCA-'  ly  -  xr'‘’8^-\ 

where  L  =  sup[||v(t)||  :  dyadic  t  [0,111  <  oo  by  4).  Thus  v  is  Holder 
continuous  from  the  right  on  the  set  of  dyadic  points  in  (0, 1)  with  Holder 
exponent  a  ^  -  log^  A.  A  similar  proof  establishes  Holder  continuity  from 
the  left. 

6)  Given  x  =  .di  . . .  dm  dyadic,  we  have  v(x)  =  Td,  (Td^  ■  •  Td,„v(0))  = 
Td,v(Tx).  By  continuity,  this  holds  for  all  x  €  10,1).  I 

Examples  of  norms  on  31'^  are  ||u||p  =  (|uil'’  +  •  ■  ■  +  IunI'’)'  ’’  for 
1  ^  P  <  ooand  ||u|U  =  max{|uii,...,iuNl'. 

Condition  (3.4)  is  most  easily  analyzed  in  a  spectral  form,  as  follows. 

The  joint  spectral  radius  of  a  set  of  N  x  N  matrices  (Ao . A,,  1  is  the 

straightforward  generalization  of  the  usual  spectral  radius  of  a  single  matrix, 
namely, 

p(Ao . An)  =  lim  sup  Am, 

m  — »oc 

where 

Am  =  max  J|Ad,  ••■Ad„.|l’'"'. 

0)6  [0.  .r\} 
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The  joint  spectral  radius  was  first  introduced  by  Rota  and  Strang  [32].  Recent 
articles  are  [3, 14]. 

Lemma  3.3. 

1)  For  every  3  >  p(Ao,...,An)  there  exists  a  constant  C  >  0  such  that 

$  C  A”'  for  every  m. 

2)  If  there  exist  C,  A  >  0  such  that  A"]  <;  CA'"  for  every  m  then 

p(Ao . An  )  $  A. 

It  follows  from  Lemma  3,3  that  (3.4)  is  equivalent  to  p(To\',  1 1  a  I  1 
(however,  p(  T^iv,  fi  v!  I  is  not  equivalent  to  A],"  j;  C  for  every  m). 

The  joint  spectral  radius  can  be  difficult  to  compute,  except  in  special 
cases.  For  a  single  matrix  A,  p(A)  is  simply  the  usual  spectral  radius  of  A 
and  is  therefore  the  largest  of  the  absolute  values  of  the  eigenvalues  of  A. 
This  is  not  true  in  general,  i.e.,  if  we  define 

(Tm  -  max  p(Aa,  •  •  ■  Aa,., 

d  0.  ,n  ! 

then  p(Ao . An)  ^  Ui  -  maxipiAol . plA,,)!.  However,  we  dt^  have 

the  following,  cf.  116). 

Lemma  3.4. 

1)  Oiu  if  P(Ao . A,,)  A,„  tor  every  m. 

2)  p(Ac . A„|  is  independent  of  the  choice  of  basis,  i.e  ,  if  B  is  .inv 

invertible  matrix  then  p(BAoB  ' . BA„B  ’ )  -  p(  Ao, _ A,, ). 

3)  If  there  exists  an  invertible  matrix  B  such  that  BAeB"’ . BA„B  ’ 

are  all  simultaneously  symmetric,  then  p(Ao . A,,)  --  oi. 

Berger  and  Wang  have  proved  that  p( Ao, _ A„  )  lim  sup  lt,„,  and 

therefore  p(  Ao . A,,  1  sup  a,,,  13). 

We  return  now  to  consideration  of  the  matrices  lo,  f  i.  Since  V  has  di¬ 
mension  N  I,  an  appropriate  change  of  basis  gives  p(  Toiv.  Ifv)  plSo.Sil 
where  So,  S I  are  (N  I)  ■  (N  -  1 )  matrices  (not  necessarily  unique). 

Remark  3.5.  For  N  .  3  we  can  set 


So 

(  i 

and  S\  ^  ( 

'  1  Co  --  c? 

Co\ 

\  c<  i 

Co  C  <  / 

\ 

V  0 

C<  / 

cf.  [9].  The  shaded  area  in  Figure  3.1  shows  the  set  SS  of  points  (co.Ci) 
for  which  So  and  S|  can  be  simultaneously  symmetrized  with  p(So,Sil  < 
1.  Continuous,  integrable  scaling  functions  therefore  exist  for  all  points  in 
this  region. 
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Figure  3.1:  Region  SS  where  simultdnoous  synimetrization  i.s  po.ssi- 
ble  and  leads  to  continuous  scaling  functions  (shaded  area). 


In  the  regions  where  simultaneous  svmmetrization  is  not  possible. 
Lemma  .3.4  can  be  u.sed  to  estimate  the  joint  spectral  radius. 

Remark  3.6.  Set  N  and  let  be  the  set  of  points  Ici  .c^i  siun 
that  p(St',Si!  V  A,„  ■-  I  with  the  choice  of  norm  Bv  Theorem  3.2 

continuous,  integrable  scaling  functions  exist  for  all  points  in  anv 
F-igures  3. 2-3.4  show  C,,  i  for  several  choices  of  p,  i  e  ,  the  sets  obtained  bv 
considering  the  matrices  Se,  Si  directly  (since  A i  max'  Se  r.  Si  ,,  ). 

Figure  3.5  shows  the  region  CT>,i„  obtained  by  considering,  U'r  each 
point  (co,  cO.  the  Euclidean  space  norm  |1  ||,>  of  all  6.5,536  possible  products 
S<i,  Sa,.,  of  So  and  Si  of  length  16. 

The  union  of  the  regions  shown  in  Figures  .3.2-.3.5,  plus  the  region  SS 
shown  in  Figure  3.1,  is  shown  in  Figure  3.6.  Continuous,  integrable  scaling 
functions  therefore  exist  for  all  points  in  the  shaded  area  in  Figure  3.6.  By 
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Remark  3.10,  there  are  no  continuous,  inlegrable  scaling  functions  on  or 
outside  the  solid  boundary  shown  in  f-igure  3.6. 

Note  that  half  of  the  circle  of  orthogonality  lies  inside  the  shaded  area 
in  Figure  3.6,  and  half  lies  outside  the  solid  line.  Therefore  there  exist  many 
wavelet  bases  with  N  --  3  for  which  the  wavelet  is  continuous,  cf.  Figures 
3.7  and  3.8. 

For  large  tn,  direct  computation  of  A,„  is  impractical.  The  following 
algorithm  can  be  used  to  select  a  subset  of  matrices  which  can  be  used  to 
estimate  p(Ao . An )  [10), cf.  [16]. 

Proposition  3.7.  Given  p  >  p(Ao . An).  For  each  of  the  matrices 

Ao . A„  in  turn,  implement  the  following  recursion. 

•  Given  a  product  P  =  A<i,  -  Ad,,,.  If  ||P|i’'''"  <  p  then  -seep  P  as  a 
building  block.  Otherwise,  repeat  this  step  with  each  of  the  products 
PAo . PAn  in  turn. 
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Label  the  resulting  set  of  building  blocks  P| _ _ Pi,  and  let  nr,  be  the  length 

of  the  product  Pj.  Then  the  following  statements  hold. 

1)  There  is  an  r  >  0  such  that  if  P  =  A,i,  A,,„^  is  any  product  of  the 

matrices  Ac . A„,  then  P  ^  P,,  ■  Pj^  R  where  R  is  some  product  of 

at  most  r  of  the  matrices  Ao . A„. 

2)  p(Ao . AJ  <:  max;||P,||'  "" . j|Pil]' 

This  algorithm  can  be  used  to  significantly  shorten  the  time  required 
to  estimate  a  joint  spectral  radius. 

Remark  3.8.  For  N  =  3  and  (co.cO  =  (.6,  .2),  for  which  simultaneous 
symmetrization  is  not  possible,  we  compute  (using  the  norm  |j  j|i)Ai  .737 
and  An  =  -682.  The  computation  of  An  required  the  calculation  of  8192 
matrix  products;  however,  the  algorithm  given  in  Proposition  3.7  equals 
this  estimate  after  only  94  matrix  product  computations,  A  deeper  search. 
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with  a  maximum  matrix  product  length  of  73,  required  only  14156  ma¬ 
trix  product  computations  and  resulted  in  the  estimate  p(So,Si)  <  .661. 
Even  if  Xji  could  be  computed  it  would  not  improve  this  estimate,  e.g., 
l|S^SiSp‘*SiSj‘*SiSj'SiS^‘SiSp^Stjl',  =  .663.  These  computations,  and 
the  significance  of  the  point  (co.ci)  =  (.6. -.2),  are  explained  in  detail  in 
[9];  note,  however,  that  the  Holder  exponent  of  continuity  for  the  scal¬ 
ing  function  determined  by  the  coefficients  (co.c.?)  (.6,  .2)  is  at  least 

-  log2  .661  ~  .598,  and  therefore  this  scaling  function  is  smoother  than  the 
standard  four-coefficent  example,  the  Daubechies scaling  function  D4,  which 
is  determined  by  the  coefficients  (co.cH  =  ((I  +  v'^)/4,(l  -  v/3)M),  and 
whose  Holder  exponent  of  continuity  is  approximately  .550.  These  two  scal¬ 
ing  functions  are  shown  in  Figures  3.7  and  3.8.  Each  of  these  two  choices  of 
coefficients  lies  on  the  circle  of  orthogonality  and  determines  a  multiresolu¬ 
tion  analysis  for  L'^(93). 

Theorem  3.2  is  extended  to  a  necessary  and  sufficient  condition  in  [10]. 


{ 
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We  briefly  indicate  now  the  method  used  to  obtain  the  converse  result.  Given 
an  N  -  N  matrix  A  and  an  eigenvalue  A  of  A,  set  Ua  =  [u  t  :  ( A  -  A)'"u 
0  for  some  k  >  OJ.  By  standard  Jordan  decomposition  techniques  we 
can  write  =  Ua  ■  W,  where  W  is  a  unique  A-invariant  subspace  of  C*^. 
Given  v  t  we  say  that  v  has  a  component  in  Ua  if  v  ^  u  +  v\’  where 
u  e  Ua,  w  t  W,  and  u  /  0.  The  following  result  is  from  [10]. 

Theorem  3.9.  Assume  v  is  a  continuous  vector-valued  function  on  [0,11 
such  that  (3.2)  holds,  and  let  T  -■  Tj,  ■  ■  be  any  fixed  product  of  the 
matrices  To,  T].  Let  x  f  [0, 1]  be  that  point  whose  binary  decimal  expansion 
is  X  —  .d]  . . .  d d f  . . .  d tn  • .  If 

1)  A  is  an  eigenvalue  of  T I V,  and 

2)  there  is  some  z  €  [0, 1]  such  that  v(x)  -  v(z)  has  a  component  in  Ua, 

then  |A|  <  1  and  the  Holder  exponent  of  continuity  of  v  is  at  most 
-  log^lAr-". 
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Figure  3.6:  Union  of  the  sets  SS,  Ci.i,  Ci.i,  Coo.i-  and  C2,u;  (shaded 
area);  boundary  of  the  set  tie  (solid  line). 


Since  p(Tolv,Tilv)  =  supom  is  the  supremum  of  the  absolute  val¬ 
ues  of  the  eigenvalues  of  every  (Tj,  •Td„J|v,  it  follows  that  if  the  hy¬ 
potheses  of  Theorem  3.9  are  satisfied  for  each  product  1  =  Tj,  ■  Td„,  then 
p(Tclv.Tilv)  ^  1  with  Pm  <  1  for  all  m,  and  the  Holder  exponent  of  v  satis¬ 
fies  a  ^  -Iog2  p(Tolv.  Tilv)-  Therefore,  if  the  hypotheses  of  Theorem  3.9  are 
satisfied  for  each  product  T  =  Td,  •  ■  Td,„  then  Theorem  3.9  is  the  converse 
to  Theorem  3.2,  except  for  the  possibility  of  one  special  case,  namely, 

sup  Pm  =  1  and  Pm  <  1  for  all  m. 

m 

It  is  unknown  whether  this  special  case  can  actually  occur.  It  is  proven  in 
(9)  that  the  hypotheses  of  Theorem  3.9  are  always  satisfied  if  N  $  3  and  it  is 
conjectured  in  [10]  that  they  are  always  satisfied  in  general  except  for  a  set 
of  coefficients  of  measure  zero.  Methods  for  determining  the  validity  of  the 


hypotheses  of  Theorem  3.9  for  any  specific  choice  of  coefficients  are  given 
in  [10]. 

Remark  3.10.  Set  N  =  3  and  define  =  {(co.c?)  :  Om  ^  1}-  By  Theorem 
3.6,  no  dilation  equation  determined  by  a  point  in  Em  can  have  a  continuous, 
integrable  solution.  The  set  Ei  is  precisely  the  boundary  and  exterior  of  the 
triangle  shown  in  Figure  1.1.  The  solid  line  in  Figure  3.6  shows  a  numerical 
approximation  of  the  boundary  of  Eig  [9].  By  previous  remarks,  continuous, 
integrable  scaling  functions  do  exist  in  the  shaded  region  in  Figure  3.6. 

The  results  of  this  section  can  be  extended  from  consideration  of  con¬ 
tinuous  solutions  to  n-times  differentiable  solutions.  If  f  is  such  a  solution 
then  its  derivatives  f*’’  satisfy  the  dilation  equations 

f'’'(t)  =  Y_  2’cuf'**(2t-k). 

k 

Therefore  the  vector  (f'''(l f'"(N  -  1))^  is  a  right  eigenvector  for  the 
matrix  M  for  the  eigenvalue  2”’.  As  M  is  an  (N  -  1)  >  (N  -  1 )  matrix,  f  can 
therefore  possess  at  most  N  -  2  derivatives.  This  can  always  be  achieved  for 
an  appropriate  choice  of  coefficients  (16). 

The  following  modification  of  Theorem  3.2  for  the  case  of  higher  deriva¬ 
tives  is  from  [16]. 

Theorem  3.11.  Assume  that  the  coefficients  [Ck!  satisfy  the  sum  rules 

^(-1 k'  Ck  =  0  for  )  =  0 . n.  Define  V„  =  [u  ^  :  Cj  u  =  0,  i  = 

0 . nl,  where  e,  =  (1E2’ . NM-  If  p(Tolv...Tilv„)  <  2“”  then  there 

exists  an  n-times  differentiable  solution  f  to  (1.1),  and  the  n-th  derivative 
f'"'  of  f  is  Holder  continuous  with  exponent  a  J:  -  log2  2"  p(To!v., ,  Tiiv,.  )■ 

Remark  3.12.  For  the  case  N  =  3,  differentiable  solutions  can  exist  only  on 
the  solid  portion  of  the  line  shown  inFigurel.l.  None  of  these  solutions  can 
be  twice  differentiable.  In  particular,  for  N  =  3,  no  wavelet  which  generates 
an  affine  orthonormal  basis  can  be  differentiable  since  wavelets  must  be 
derived  from  points  lying  inside  of  the  circle  of  orthogonality. 


4.  Orthogonality 

In  this  section  we  consider  the  relationship  between  the  choice  of  coefficents 
(ckl  and  frame  or  basis  properties  of  the  associated  wavelet.  We  assume  N 
is  odd  in  this  section. 

We  require  the  following  lemmas.  Ct(fR)  denotes  the  space  of  all  con¬ 
tinuous  functions  on  .'R  which  have  compact  support.  The  proof  of  the  first 
lemma  can  be  found  in  [27]. 
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Lemma  4.1.  If  ({Vn},f)  is  a  multiresolution  analysis  then  J  f(t)  dt  =  1. 


Lemma  4.2.  If  (1.3)  holds  then  the  canonical  scaling  function  f  satisfies 
^  f(t  -  k)  =  1  a.e. 

Proof.  Set  0c  =  X[o,i)  and  0i{t)  =  Ck  0j-i  (2t  —  k).  Since  0  is  continuous, 
0(0)  =  1,  and  0j(2Y)  =  Tno(Y)0i-i (y),  *1  follows  that  0j  — >  f  weakly  in 
L-^(01),  i.e.,  (0i,h)  (f,h)  forall  he  L^(tR).  Note  that  ^  0o(t  -  k)  =  1  a.e.; 

by  induction,  the  same  is  true  of  Oj,  and  hence  of  f.  | 

Next,  we  establish  necessary  conditions  on  the  coefficients  [Ck  1  in  order 
that  a  multiresolution  analysis  exist. 

Proposition  4.3.  If  the  coefficients  [ckl  determine  a  multiresolution  analysis 
then  (1.3)  and  ( 1 .4)  hold.  The  converse  is  not  true. 

Proof.  Integrating  both  sides  of  the  dilation  equation  implies  that  Ck  =  2, 
since  J  f(t)  dt  is  nonzero  by  Lemma  4.1.  Since  f  is  orthogonal  to  its  integer 
translates, 


2  6oi  =  2 


f(t)  f(t  +  1)  dt 


2  ^CjCk 


j.k 


f{2t-j)f(2t  +2l--k)dt 


^  Ck  Ck  I  21, 


so  (1.4)  holds.  This,  combined  with  the  fact  J^Ck  -  2,  implies  {1.3). 

To  see  that  (1.3)  and  (1.4)  are  not  sufficient,  consider  the  coefficient 
choice  Co  =  1,  Cl  =  •  ■  ■  =  Cn-i  =  0,  Cn  =  1-  These  coefficients  satisfy 
(1.3)  and  (1.4),  yet  the  canonical  scaling  function  f  ^  (I.  I^I^c.n  is  nth 
orthogonal  to  its  integer  translates  if  N  >  1.  | 


Remark  4.4.  For  N  ^  3,  the  set  of  points  in  the  (co.Ct)-plane  which  satisfy 
both  (1.3)  and  ( 1 .4)  is  precisely  thecircleof  orthogonality  shown  in  Figure  1.1. 

Equations  (1.3)and  (1.4)  are  equivalent  to 

mo(0)  =  1  and  mo(n)  ^  0  (4.1) 


and 


|mo(Y)l'^  +  |mo(y  F  7t)|^  =  I  forall  y.  (4.2) 

Equation  (4.2)  implies  that,  in  signal  processing  terms,  mo(y )  and  mo(y  +  rt) 
form  a  quadrature  mirror  filter  pair  Such  filter  pairs  induce  fast  digital  signal 
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processing  algorithms,  e.g.,  subband  coding.  Daubechies  has  characterized 
those  trigonometric  polynomials  mo  which  satisfy  (4.1 )  and  (4.2)  in  [12]. 

Although  (1.3)  and  (1.4)  are  not  sufficient  to  ensure  that  (Ck)  will  gen¬ 
erate  a  multiresolution  analysis  (and  therefore  that  (gnkjn.keJ.  will  be  an 
orthonormal  basis),  Lawton  has  proven  that  (1.3)  and  (1.4)  are  sufficient  to 
ensure  that  the  sequence  (gnkln.ke?.  will  satisfy  the  reconstruction  property 
( 1 .5)  of  an  orthonormal  basis.  Such  a  sequence  is  called  a  tight  frame.  (1.5) 
alone  does  not  imply  that  (gnkln.kez.  is  an  orthogonal  sequence  or  a  basis, 
i.e.,  in  general  the  summation  in  (1.5)  is  not  unique.  See  [18]  or  [23]  for 
exposition  on  frames  and  tneir  properties. 

The  following  theorem  and  proof  are  from  [24]. 

Theorem  4.5.  If  the  coefficients  {Ck}  satisfy  (1.3)  and  (1 .4)  then  {gnkln.kei  is 
a  tight  frame  for  L^(91). 

Proof.  We  proceed  in  four  steps. 

1)  From  (4.2)  and  Theorems  2.6  and  2.1,  the  canonical  scaling  distribu¬ 
tion  f  is  an  integrable  function  with  support  contained  in  [0,  N]  and  satisfies 
Jf(t)dt  =  1. 

2)  Define  the  operator  Pp  ;  L^('21)  ->  L^(iH)  by 

Pnh=^(h,fpk)fak.  (4.3) 

k 

We  claim  that  Pp  — >  I  as  n  — >  +oo  and  Pp  ->  0  as  n  ->  -oo,  where  I  is  the 
identity  operator  on  L^(tH). 

First,  however,  we  show  that  the  operators  (Pp)  are  uniformly  bounded 
in  norm.  Since  supp(fpk)  C  Uk  =  [k/2'’,(k  -t-  N)/2''l,  with  n  fixed  each 
set  supp(fpk)  can  intersect  at  most  N  other  supp(fpi).  Therefore,  for  any 
scalars  [uk], 

^  $  N'  ^(^iQki'llfpkiii) 

=  (^lOkl")  .  (4.4) 

e.g.,  [22,  Prop.  2.4.10].  Therefore, 

llPnhlU  ^  N'  '^i|f||2  (^l(K.fpk)l')  .  (4.5) 

'  k  ' 

Now,  for  each  r  =  0, ....  N  —  1,  the  sequence  (fpdN  i  riliei  >s  art  orthogonal 
collection  of  functions  since  their  supports  are  disjoint.  Therefore,  by  Bessel's 
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inequality,  ^  l(h,  f^dN  t  d)!^  ^  li'flii-  Combining  this  with  (4.5)  we 
obtain  HPnHlU  $  N  HfH^  l|h|l2,  and  therefore  sup  llPn|l2  $  N  llflli  <  oo- 

Because  the  operators  {Pn!  are  uniformly  bounded  in  norm,  to  prove 
Pnh  — >  h  as  n  — >  +00  for  all  h  e  it  suffices  to  consider  h  in  a  dense 

subset  of  say  h  €  Cc(3l).  For  such  an  h,  since  fnc(t)  =  2"  ^  a.e. 

(Lemma  4.2),  we  can  write 


||h-P„hi|2  =  y  ^(2-''''^h!t)-(h,f„k))f„k(t)  dtj 

$  (^^||(2-"'^h(t)-(h,t„k))f„k(t)||i) 

5;  N'  ^!|fli2  . 


where  we  have  used  (4.4)  again  and  where 
Knk  =  sup  !2""  ^  h(t]  -  (h,fnk)i. 

t  €  I  n  k 

To  see  that  H  a^ik  — *  0  4s  n  -»  +00,  define 
|3nk  =  sup  lh(s)-h(t)|  and 

^^.16  1  nV 


Note  that  hnlt)  0  pointwise  as  n  +00  since  h  t  Ci(.'R).  Further, 
|3i,k  ^  |3rvo  4nd  I„k  C  InC  for  all  k,  so  hn  ^  1^  for  n  >  0.  As  is  clearly 
integrable,  it  follows  from  the  Lebesgue  Dominated  Convergence  Theorem 
that  J  hn(tl  dt  -»  0  as  n  -)  +00.  Now,  since  J  fnk(t)dt  -  2  '"  ^  (Lemma 
4.1),  we  have  for  1 1  Uk  that 


h(t)  -  (h,  fr 


)l  = 

[  (H(t) 

Jin., 

$ 

ff  lH(t) 

VJln. 

$ 

llflli  link 

(H(t)  -  h(s))fnk(s)ds 


^  lh(t)  -  h(s)|'^  ds^  ^  |fnk(s)i‘^ds 


Therefore,  “nk  ^  IKHi  jHn(s)ds  -a  0  as  n  — »  +00,  which,  combined 
with  (4.6),  implies  that  Pnh  ->  h  in  L^(fH)  as  n  ->  +00. 

A  similar  proof  shows  that  Pnh  -a  0  as  n  — >  —00. 

3)  Define  the  operator  Fn  :  L^(91)  -t  L^fiH)  by 

Pnh  =  ^^(h,  gnk)9nk  • 


We  claim  then  that  Fn  =  Pnii  “  Pn  For  each  n  G  Z.  Using  the  dilation 
equation  (1.1)  and  the  definition  (1.2)  we  compute 

P„h  +  F„h  =  ^(h(t).2"''^f(2"t  -  k))2"''^f(2'’t  -  k) 

k 

+  ^(h(t).2''''^g(2’*t  -  k))2''''^g(2"  t  -  k) 

k 

=-■2"  Y_  (h(t),c„f(2'’"t.-2k-  p))c„f(2"'’t-2k-  q) 

+2"  Y_  (h(t).(-l)'’cN-,.f(2""t-2k-r)) 
•(-l)‘'CN-<,t(2’’"t-2k-q) 

=  2"  Y_  \H(t),(CpC„  4  (-l)'''‘'CN-pCN-<,)f(2’''’t-2k-rl' 

p.t)  .c 

xf(2''"t-2k-q) 

^  2  ^(Ci  -2kCl-2k  *  (-l)'''cN_j,2kCN-li2k) 
i.l  k 

.  fh(t),2"*"'  ^f(2'’"t-i))2'""''  -t(2''"t  -  1) 

i.l 


where 


C(j,  U  -  2  -  2k  f  I  2k  f  (-l)'''cN-ii2kCN  li2k)- 

k 

It  suffices,  therefore,  to  show  that  C(i,l)  =  6ji.  Note  that  by  making  the 
change  of  index  m  =  k  (  i  +  I  -  (N  1  )/2  (recall  N  is  odd)  in  the  second 
summation,  we  obtain 


C(2j,2l) 


1 

2 

1 

2 


'y  l-'2i  2k  C2l  2k 

k 

y  C2j-  k  C2l-k 

k 


4 


1 

2 


y  C2j  -2.n  I  1  C2l  -2m  i  1 
m 


because  of  hypothesis  (1.4).  Similar  calculations  show  that  C(2i,21  +  1)  = 
C(2j  f  1, 21)  Oand  C(2)  4  1,21  f  I)  =  6ji,  as  desired. 
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4)  From  steps  2)  and  3),  Y.n  =  linitnoolPn  -  P-n)  =  I-  That  is,  for 
h  €  L^(fH),  h  =  £„  Fnh  =  ,;(h.gnk)  gnk,  whence  'gi.kin.ket  is  a  tight 

frame.  I 

Corollary  to  4.5.  The  coefficients  Jckl  determine  a  multiresolution  analysis 
if  and  only  if 

1)  (1.3)  and  (1.4)  are  satisfied,  and 

2)  f  is  orthogonal  to  each  of  its  integer  translates. 

In  this  case,  [gnklM.ktC,  is  an  orthonormal  basis  for  L^(fR). 


Proof.  Because  of  Proposition  4.3  we  need  only  prove  that  if  1)  and  2) 
hold  then  'cu'  determines  a  multiresolution  analysis.  From  (1.4)  and  the 
orthogonality  of  the  integer  translates  of  f. 


if(ti:'^  dt  =  ^  Cj  Ck 


i  A 


f(2t  -))f(2t  -  kldt 


Therefore  |f(t  -  is  an  orthonormal  set  and  hence  is  an  orthonormal 

basis  for  Vo  =  span!f(t  -  Defining  V„  •-  span'f„k  acc-  wo  have 

V„  c  V„  .  I  because  f  isa  scaling  function.  The  operator  P„  defined  by  |4.,i) 
is  then  the  orthogonal  projection  of  L‘(fH)  onto  Since  P„  -  -  0  as  n  *  x 
we  have  "V„  ;0’.,  and  similarly  '.'V,,  is  dense  in  ["('21!  since  P„  ;  I  as 

n  ^  -1-00.  Thus  ('Vn'.fl  isa  multiresolution  analysis. 

To  prove  that  :q„k  n.ksc.  is  an  orthonormal  basis,  note  that  from  ( 1 .2), 
(1.4),  and  the  orthogonality  of  the  integer  translates  of  f, 

Il9il]  J^(-1 )"  ^  cn  -i  CN'-k  ffl^t  -  ))f(2t  kldt  :  ^  ^'■'k  '■ 

j,k  k 

From  the  theorem,  we  know  that  !g„k  n.k.  c.  is  a  tight  frame,  so  for  rn,  i  ■  C 
fixed, 


I  -  lig.nilli 

“  Ml  i .  ^  iM  i ' 

“  ^  9  m  j  1  ^  ( 9  Ml  j  »  9  M  i. )  9  n  tk  ^ 
n  A 

^  X.  K9Mii,9*A)l^. 

n  A 

Thus  (cj,„j,giik)  -  6„,n  6ji.,  i.e.,  !gnkln.k.;c  forms  an  orthonormal  set.  This, 
combined  with  the  tight  frame  property,  implies  that  [gnklM.kcc.  is  an  or¬ 
thonormal  basis.  I 
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Lawton  and  Cohen  have  independently  established  necessary  and  suf¬ 
ficient  conditions  under  which  f  will  be  orthogonal  to  its  integer  translates. 
Lawton's  formulation  is  the  following  [24,  25).  denotes  the  space  of  all 
square-summable  sequences. 

Theorem  4.6.  Define  the  operator  G  :  f'^  — «  by 

(Gq)i  =  ^  ^  Cj  cu  Q’uj-c  for  a  e 

i 

Then  the  coefficients  (Ckl  determine  a  multiresolution  analysis  if  and  only  if 

1)  (1.3)  and  ( 1 .4)  are  satisfied,  and 

2)  5ci  is  the  only  eigenvector  for  G  for  the  eigenvalue  1. 


Proof.  Note  that  6oi  is  an  eigenvector  for  G  for  the  eigenvalue  1  because 
of  (1.4),  and  the  sequence  a  defined  by  qi  =  J  f(t)f(t  +  Udt  is  also  an 
eigenvector  for  G  for  the  eigenvalue  1  since 


(Gq)i 


\.k 


f(t)f(t 


-21-t-i  -  k)dt 


^cef(t-2l  -  k))  dt 
dt 


Oi. 


Therefcire,  if  ioi  is  the  only  eigenvector  for  G  for  the  eigenvalue  1  then 
Qi  =  c  6ci  for  some  constant  c,  so  f  is  orthogonal  to  its  integer  translates.  The 
converse  of  this  statement  is  proved  in  [251.  The  proof  is  therefore  complete 
by  the  corollary  to  Theorem  4.5  I 

Lawton  has  proved,  using  a  result  of  Pollen  [31],  that  except  for  a 
set  of  measure  zero,  coefficients  which  satisfy  (1.3)  and  (1.4)  also  satisfy 
the  condition  that  6oi  be  the  only  eigenvector  for  G  for  the  eigenvalue  1. 
Therefore  almost  all  choices  of  coefficients  satisfying  (1.3)  and  (1.4)  will 
determine  a  multiresolution  analysis. 

Cohen's  formulation,  which  has  been  shown  to  be  equivalent  to  Law¬ 
ton's,  is  the  following  [6],  cf.  [25). 


Theorem  4.7.  The  coefficients  Ict)  determine  a  multiresolution  analysis  if 
and  only  if 

1 )  (1.3)  and  ( 1 .4 )  are  satisfied,  and 

2)  there  exists  a  y  c  1-^/2,  n/21  such  that  fly  +  2k7T)  =  0  for  every  k  £  2,. 
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Remark  4.8.  For  N  =  3,  the  set  of  points  satisfying  ( 1 .3)  and  ( 1 .4)  is  the  circle 
shown  in  Figure  1.1.  Of  these,  every  point  with  the  single  exception  of  the 
point  (1,1)  does  determine  a  multiresolution  analysis  [11]. 
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a  Most  signal  information  is  carried  bv  irregular  structures  and  transient 
phenomena.  The  mathematical  characterization  of  singularities  with  Lip- 
schitz  exponents  is  reviewed.  VVe  explain  the  theorems  that  estimate  Itxal 
Lipschitz  exponents  of  functions,  from  the  evolution  across  scales  of  their 
wavelet  transform.  We  then  prove  that  the  IcK'al  maxima  of  a  wavelet 
transform  detect  the  kxations  of  irregular  stnrcturesand  provide  numerical 
procedures  to  compute  their  Lipschitz  exponents.  Tlie  wavelet  transforms 
of  singularities  with  fast  oscillations  have  a  different  behavior  that  we  study 
separately.  The  local  frequency  of  the  oscillations  are  measured  from  the 
wavelet  transform  local  maxima.  It  has  been  shown  numericallv  that  one 
and  two-dimensional  signals  can  be  reconstructed,  with  a  goixi  approxima¬ 
tion,  from  the  local  maxima  of  their  wavelet  transform  [16]  As  an  appli¬ 
cation,  we  develop  an  algorithm  that  removes  white  noise  from  signals,  by 
analyzing  the  evolution  of  the  wavelet  transform  maxima  across  scales. 
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1.  Introduction 

Singularities  and  irregular  structures  often  carry  the  most  important  infor¬ 
mation  in  signals.  In  images,  the  discontinuities  of  the  intensity  provide  the 
locations  of  the  object  contours,  which  are  particularly  meaningful  for  recog¬ 
nition  purposes.  For  many  other  types  of  signals,  from  electro-cardiograms 
to  radar  signals,  the  interesting  information  is  given  by  transient  phenomena 
such  as  peaks.  In  physics,  it  is  also  important  to  study  irregular  structures  to 
infer  properties  about  the  underlying  physical  phenomena  (17,  2,  1].  Until 
recently,  the  Fourier  transform  was  the  main  mathematical  tool  for  analyz¬ 
ing  singularities.  The  Fourier  transform  is  global  and  provides  a  description 
of  the  overall  regularity  of  signals,  but  it  is  not  well  adapted  for  finding 
the  locatioit  and  the  spatial  distribution  of  singularities.  This  was  a  niajor 
motivation  for  studying  the  wavelet  transform  in  mathematics  [20]  and  in 
applied  domains  [HI.  By  decomposing  signals  into  elementary  building 
blocks  that  are  well  localized  both  in  space  and  frequency,  the  wa\  elet  trans¬ 
form  can  characterize  the  local  regularity  of  signals.  The  wav  elet  transform 
and  its  main  properties  are  briefly  introduced  in  Section  2.  In  mathematics, 
the  local  regularity  of  a  function  is  often  measured  with  Lipschilz  exponents, 
Section  3  is  a  tutorial  review  on  Lipschilz  exponents  and  their  characteriza¬ 
tion  with  the  Fourier  transform  and  the  wavelet  transform.  We  explain  the 
basic  theorems  that  relate  local  Upschitz  exponents  to  the  evolution  across 
scales  of  the  wavelet  transform  values.  In  practice,  these  theorems  do  not 
provide  simple  and  direct  strategies  for  detecting  and  characterizing  singu 
larities  in  signals.  The  following  sections  show  that  the  wavelet  transform 
local  maxima  give  an  efficient  approach  for  studying  these  singularities. 

The  detection  of  singularities  with  multiscale  transforms  has  been  stud¬ 
ied  not  only  in  mathematics  but  also  in  signal  processing.  In  Section  4,  we 
explain  the  relation  between  the  multiscale  edge  detection  algorithms  used 
in  computer  vision  and  the  approach  of  Gro.ssmann  [101  based  on  the  phase 
of  the  wavelet  transform.  The  detection  of  wavelet  transform  local  max¬ 
ima  is  strongly  motivated  by  these  techniques.  Section  5  is  a  mathematical 
analysis  of  the  local  maxima  properties.  We  prove  that  local  maxima  detect 
all  singularities  and  that  local  Lipschitz  exponents  can  often  be  measured 
from  their  evolution  across  scales.  We  derive  practical  algorithms  to  ana¬ 
lyze  isolated  or  non-isolated  singularities  in  signals.  Numerical  examples 
illustrate  the  mathematical  results.  The  wavelet  transform  has  a  different 
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behavior  when  singularities  have  fast  oscillations.  This  particular  case  is 
studied  separately.  The  local  frequency  of  the  oscillations  can  be  measured 
from  the  points  where  the  wavelet  transform  is  locally  maximum  both  along 
the  scale  and  spatial  variables.  This  approach  is  closely  related  to  the  work 
of  Escudie  and  Torresani  [9]  for  measuring  the  modulation  law  of  asymptotic 
signals  [8]. 

Another  important  issue  is  to  understand  whether  one  can  reconstruct 
a  signal  from  the  local  maxima  of  its  wavelet  transform.  If  it  is  possible,  it 
allows  us  to  process  a  signal's  singularities  by  modifying  the  local  maxima 
of  its  wavelet  transform  and  then  reconstruct  the  corresponding  function. 
We  review  the  most  recent  results  of  Meyer  [21]  on  this  completeness  is¬ 
sue  and  describe  a  numerical  algorithm  developed  by  Zhong  and  one  of 
us  [16],  which  closely  reconstructs  a  signal  from  the  wavelet  local  maxima. 
One  application  is  the  removal  of  white  noise  from  signals.  In  such  prob¬ 
lems,  we  often  have  some  prior  information  on  the  differences  between  the 
signal  singularities  and  the  noise  singularities.  We  describe  an  algorithm 
that  differentiates  the  signal  components  from  the  noise,  by  selecting  the 
wavelet  transform  local  maxima  that  correspond  to  the  signal  singularities. 
After  removing  the  local  maxima  of  the  noise  fluctuations,  we  reconstruct  a 
"denoised"  signal. 


•  For  any  function  f[x),  fs(x)  denotes  the  dilation  of  f(x)  by  the  s..ale 
factor  s: 
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2.  Continuous  wavelet  transform 

This  section  reviews  the  main  properties  of  the  wavelet  transform.  The  for¬ 
malism  of  the  continuous  wavelet  transform  was  first  introduced  by  Morlet 
and  Grossmann  [11].  Let  i[)(x)  be  a  complex  valued  function.  The  function 
\i)(x)  is  said  to  be  a  wavelet  if  and  only  if  its  Fourier  transform  satisfies 


dcu  =  Cu,  <  -foo. 


This  condition  implies  that 


i|;(u)  du  =  0. 


Let  \iJs(x)  =  1 4)(x/s)  be  the  dilation  of  il)(x)  by  the  scale  factor  s.  The  wavelet 
transform  of  a  function  f  €  L^(9t)  is  defined  by 


Wf(s,x)  =  f  » (2.2) 

The  Fourier  transform  of  VVfls.x)  with  respect  to  the  x  variable  is  simply 
given  by 

Wf(s,  ce)  = (2.3) 

The  wavelet  transform  can  easily  be  extended  to  tempered  distributions, 
which  is  useful  for  the  scope  of  this  paper.  For  a  thorough  presentation  of 
the  theory  of  distributions,  the  reader  might  want  to  consult  the  book  of 
Treves  [26].  If  f(x)  is  a  tempered  distribution  of  order  n  and  if  the  wavelet 
v|)(x)  is  n  times  continuously  differentiable,  then  the  wavelet  transform  of 
f(x)  give  by  (2.2)  is  well  defined.  For  example,  a  Dirac  6(x)  is  a  tempered 
distribution  of  order  0  and  W6(s.  x)  =  il)s(x),  if  iJm'x)  is  continuous. 

One  can  prove  [II]  that  the  wavelet  transform  is  invertible  and  t ( x )  is 
recovered  with  the  formula 


j  r  f  :x:  p  ♦  CK 

-  Jo  ^  V 


Wfls,u)v).>s(u  --  x)du  — , 


where  4’s(x)  denotes  the  complex  conjugate  of  i^slx).  The  wavelet  transform 
Wf(s,x)  is  a  function  of  the  scale  s  and  the  spatial  position  x.  The  plane 
defined  by  the  ordered  pair  of  variables  (s,  x)  is  called  the  scale-space  plane 
[27].  An  arbitrary  function  F(s,x)  is  not  a  priori  the  wavelet  transform  of 
some  function  f(x).  One  can  prove  that  F(s,x)  is  a  wavelet  transform  if  and 
only  if  it  satisfies  the  reproducing  kernel  equation 


F(so,Xo)  = 


F(s,x)K(so,s,Xo,x)  dx^, 
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with 

1 

K(so,s,xo,x)  =  ^  i]>s(u- -x)\l>s(,(xo -u)du.  (2.6) 

J— oo 


The  reproducing  kernel  K(sci,s,xo,x)  expresses  the  intrinsic  redundancy 
between  the  value  of  the  wavelet  transform  at  (s,x)  and  its  value  at  (so,  xo). 


3.  Characterization  of  iocai  reguiarity  with  the  waveiet  transform 

As  mentioned  in  the  introduction,  a  remarkable  property  of  the  wavelet 
transform  is  its  ability  to  characterize  the  local  regularity  of  a  function.  In 
mathematics,  the  local  regularity  of  functions  is  often  measured  with  Lips- 
chitz  exponents. 

Definition  3.1. 

■  Let  n  be  a  positive  integer  and  n  ^  a  $  n  +  I.  A  function  f  (x)  is  said 
to  be  Lipschitz  a,  at  xo,  if  and  only  if  there  exists  two  constants  A  and 
ho  >  0,  and  a  polynomial  Pn(x)  of  order  nsuch  that  for  h  <  ho 

|f(xo  +  h)-Pn(h)K  AIhr.  (3.1) 

•  The  function  f  (x)  is  uniformly  Lipschitz  a  over  the  interval  la,  b[  if  and 
only  if  there  exists  a  constant  A  such  that  for  any  xo  €  la,  bl  there  exists 
a  polynomial  of  order  n,  Pn(x),  such  that  equation  (3.1 )  is  satisfied  for 
any  xo  +  h  €  la,b(. 

■  We  call  Lipschitz  regularity  of  f(x)  at  xo  the  sup  of  all  values  a  such 
that  f(x)  is  Lipschitz  a  at  xo- 

•  We  say  that  a  function  is  singular  at  xo  if  it  is  not  Lipschitz  1  at  xo. 

A  function  f  (x)  which  is  continuously  differentiable  at  a  point  is  Lips¬ 
chitz  1  at  this  point.  If  the  derivative  of  f  (x)  is  bounded  but  discontinuous 
at  Xo,  f(x)  is  still  Lipschitz  1  at  xo  and  following  Definition  3.1  we  consider 
that  f  (x)  is  not  singular  at  xo-  One  can  easily  prove  that  if  f  (x)  is  Lipschitz  a, 
for  a  >  n,  then  f  (x)  is  n  times  differentiable  at  x©  and  the  polynomial  Pn(h.) 
is  the  first  n  +  1  terms  of  the  Taylor  series  of  f(x)  at  xc-  For  n  =  0,  we 
have  Pn(h.)  =  f(xo).  The  Lipschitz  regularity  oo  gives  an  indication  of  the 
differentiability  of  f  (x)  but  it  is  more  precise.  If  the  Lipschitz  regularity  ao  of 
f(x)  satisfies  n  <  oo  <  n  +  1,  then  we  know  that  f(x)  is  n  times  differentiable 
at  Xo  but  its  n'**  derivative  is  a  distribution  which  is  singular  at  xo,  and  cxo 
characterizes  this  singularity. 

One  can  prove  that  if  f(x)  is  Lipschitz  a  then  its  primitive  g(x)  is 
Lipschitz  a  +  1.  However,  it  is  not  true  that  if  a  function  is  Lipschitz  a  at 
a  point  Xo,  then  its  derivative  is  Lipschitz  a  —  1  at  the  same  point.  This  is 
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due  to  oscillatory  phenomena  that  are  further  studied  in  Section  5.3.  On  the 
opposite,  one  can  prove  that  if  a  is  not  an  integer  and  a  >  1,  a  function 
is  miifarmly  Lipschitz  a  on  an  interval  la,b[  if  and  only  if  its  derivative  is 
uniformly  Lipschitz  a  —  1  on  the  same  interval.  This  property  enables  us 
to  define  negative  uniform  Lipschitz  exponents  for  tempered  distributions. 
Integer  Lipschitz  exponents  have  a  different  behavior  that  is  not  studied  in 
this  article.  It  is  necessary  to  define  properly  the  notion  of  negative  Lipschitz 
exponents  for  tempered  distributions  because  they  are  often  encountered  in 
numerical  computations. 

Definition  3.2.  Let  f(x)  be  a  tempered  distribution  of  finite  order.  Let  a  be 
a  non-integer  real  number  and  [a.b]  an  interval  of  The  distribution  f(x) 
is  said  to  be  uniformly  Lipschitz  a  on  Jo.bf  if  and  only  if  its  primitive  is 
uniformly  Lipschitz  a  +  1  on  la,  b|. 

For  example,  the  second  order  primitive  of  a  Dirac  is  a  function  which 
is  piece-wise  linear  in  the  neighborhood  x  =  0.  This  function  is  uniformly 
Lipschitz  1  in  the  neighborhood  of  0  and  thus  uniformly  Lipschitz  a  for  a  1 . 
As  a  consequence  of  Definition  3.2,  we  can  see  that  a  Dirac  is  uniformly 
Lipschitz  a  for  a  <  —1  in  the  neighborhood  of  0.  Since  Definition  3.2  is  not 
valid  for  integer  Lipschitz  exponents,  it  does  not  allow  us  to  conclude  that  a 
Dirac  is  Lipschitz  -1  at  0  but  we  can  derive  that  its  Lipschitz  regularity  (see 
Definition  3. 1 )  is  - 1  in  the  neighborhood  of  0.  Definition  3.2  is  global  because 
uniform  Lipschitz  exponents  are  defined  over  intervals  but  not  at  points.  I  ( is 
possible  to  make  a  local  extension  of  Lipschitz  exponents  to  negative  values 
through  the  microlocalization  theory  of  Bony  [5, 15],  but  these  sophisticated 
results  go  beyond  the  scope  of  this  article.  For  isolated  singularities,  one  can 
define  pointwise  Lipschitz  exponents  through  Definition  3.2.  We  shall  say 
that  a  distribution  f(x)  has  an  isolated  singularity  Lipschitz  a  at  xo  if  and 
only  if  f(x)  is  uniformly  Lipschitz  a  over  an  interval  la,  bf ,  with  xo  t  la,  hi , 
and  f  (x)  is  uniformly  Lipschitz  1  over  any  sub-interval  of  la,  b[  that  does  not 
include  xo.  For  example,  a  Dirac  centered  at  0  has  an  isolated  singularity  at 
X  —  0  whose  Lipschitz  regularity  is  —1. 

A  classical  tool  for  measuring  the  Lipschitz  regularity  of  a  function 
f(x)  is  to  look  at  the  asymptotic  decay  of  its  Fourier  transform  f(u').  One 
can  prove  that  a  bounded  function  f(x)  is  uniformly  Lipschitz  a  over  if  it 
satisfies; 

•  1  oc 

|f(m)|(1 -t-lcondox +00.  (3.2) 

J  —  OC 

This  condition  is  sufficient  but  not  necessary.  It  gives  a  global  regularity 
condition  over  the  whole  real  line  but  one  cannot  derive  whether  the  function 
is  locally  more  regular  at  a  particular  point  xo.  This  is  because  the  Fourier 
transform  unlocalizes  thf  'nformation  along  the  spatial  variable  x.  The 


Characterization  of  singularities  } 


{  53 

Fourier  transform  is  therefore  not  well  adapted  to  measure  the  local  Lipschitz 
regularity  of  functions. 

If  the  wavelet  has  compact  support,  the  value  of  Wf(s,xo)  depends 
upon  the  values  of  f(x)  on  a  neighborhood  of  xo  of  size  proportional  to 
the  scale  s.  At  fine  scales,  it  provides  localized  information  on  f(x).  The 
following  theorems  relate  the  asymptotic  decay  of  the  wavelet  transform  at 
small  scales  to  the  local  Lipschitz  regularity.  We  suppose  that  the  wavelet 
tl)(x)  is  continuously  differentiable  and  that  it  has  compact  support  although 
this  last  condition  is  not  strictly  necessary.  The  first  theorem  is  a  well  known 
result  and  a  proof  can  be  found  in  [13]. 

Theorem  3.3.  Let  f(x)  e  L^(iH)  and  [a,  b)  be  an  interval  of  iH.  Let  0  <  a  <  1. 
The  function  f(x)  is  uniformly  Lipschitz  a  over  any  interval  la  +  e,  b  -  e[, 
with  b  —  a  >  e  >  0,  if  and  only  if  there  exists  a  constant  Ac  such  that  for  any 
X  €  1q  +  e,  b  -  e[  and  any  scale  s  >  0, 

|Wf(s.x)|$  Acs“.  (3.3) 

Iff(x)  e  L^(91),  for  any  scale  So  >  0,  by  applying  the  Cauchy-Schwarz 
inequality,  we  can  easily  prove  that  the  function  |Wf(s.  x)|  is  bounded  over 
the  domains  >  so.  Hence,  (3.3)  is  really  a  condition  on  the  asymptotic  decay 
of  tWf(s,  x)|  when  thescalesgoes  to  zero.  Thesufficient  condition  (3.2)  based 
on  the  Fourier  transform  implies  that  |f(io)|  has  a  decay  "faster"  than  1  /lv°. 
Equation  (3.2)  is  similar  if  one  considers  the  scale  s  as  locally  "equivalent" 
to  l/o).  However,  in  contrast  to  the  Fourier  transform  condition,  (3.3)  is  a 
necessary  and  sufficient  condition  and  is  localized  on  intervals  and  not  over 
the  whole  real  line. 

In  order  to  extend  Theorem  3.3  to  Lipschitz  exponents  «  larger  than 
1,  we  must  impose  that  the  wavelet  ili(x)  has  enough  vanishing  moments. 
A  wavelet  vKx)  is  said  to  have  n  vanishing  moments  if  and  only  if  for  all 
positive  integers  k  <  n,  it  satisfies 

r  t  oo 

x*‘il)(x)  dx  =  0.  (3.4) 

J  ~oc 

If  the  wavelet  3|)(x)  has  n  vanishing  moments,  then  Theorem  3.3  remains 
valid  for  any  non-integer  value  a  such  that  0  <  a  <  n.  Let  us  see  how  this 
extension  works,  in  order  to  understand  the  impact  of  vanishing  moments. 
Since  4)  (x)  has  compact  support  rii) to)  is  n  times  continuously  differentiable, 
and  one  can  derive  from  (3.4)  that  ii)(u))  has  a  zero  of  order  n  at  cu  =  0.  For 
any  integer  p  <  n,  riiloi)  can  be  factored  into 


ii)(ai)  =  (iu))'’\i)'(ui). 


(3.5) 
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In  the  spatial  domain  we  have 


4lj(x) 


dPx 


(3.6) 


and  the  function  li)’(x)  satisfies  the  wavelet  admissibility  condition  (2.1). 
The  p**"  derivative  of  any  function  f  (x)  is  well  defined  in  the  sense  of  distri¬ 
butions.  Hence, 


Wf(s,x)  =  f*il).(x)  =^(f»sPij.J)(x)  =  sP  (^)-  (3.7) 

The  wavelet  transform  of  f(x)  with  respect  to  the  wavelet  ijjfx)  is  thus  equal 
to  the  wavelet  transform  of  its  p'*"  derivative,  computed  with  the  wavelet 
rf)'  (x),  and  multiplied  by  s^.  Let  p  be  an  integer  such  that  0  <  ot  -  p  <  1. 
The  function  f(x)  is  uniformly  Lipschitz  a  on  an  interval  ]  a,  b[,  if  and  only  if 
is  uniformly  Lipschitz  a  —  p  on  the  same  interval.  Since  0  <  a  -  p  <  I, 
Theorem  3.3  applies  to  the  wavelet  transform  of  defined  with  respect  to 
the  wavelet  vj)'.  Theorem  3.3  shows  that  is  uniformly  Lipschitz  a  -  p 
over  intervals  la  -H  e,  b  -  ef  if  and  only  if  we  can  find  constants  A,  >  0  such 
that  for  X  e  1q  -I-  €,  b  -  e[. 


d'’f 


dx'' 


^  A.s^-P. 

Equation  (3.7)  proves  that  this  is  true  if  and  only  if 
|Wf(s.x)|  $  A,s". 


(3.8) 


Equation  (3.8)  extends  Theorem  3.3  for  «  <  n.  If  \lj(xj  has  n  vanishing 
moments  but  not  n  +  1,  then  the  decay  of  iWf(s,x)l  does  not  tell  us  anything 
about  Lipschitz  exponents  for  a  >  n.  For  example,  the  function  f(x)  =  sin(x) 
is  uniformly  Lipschitz  +oo  on  any  interval,  but  if  li) (x )  has  exactly  n  vanishing 
moments  one  can  easily  prove  that  the  asymptotic  decay  of  lWf|s,x)|  is 
equivalent  to  s"  on  any  interval.  This  decay  does  not  allow  us  to  derive 
anything  on  the  regularity  of  the  n  +  T'  derivative  of  sin  (x).  For  a  <  0  and 
a  f?  Z,  (3.3)  of  Theorem  3.3  remains  valid  to  characterize  uniform  Lipschitz 
exponents.  In  this  case,  we  do  not  need  to  impose  more  than  one  vanishing 
moment  on  the  wavelet  ii)(x).  The  proof  can  easily  be  derived  from  the 
statement  of  Definition  3.2. 

For  integer  Lipschitz  exponents  a ,  (3.3 )  is  necessary  but  not  sufficient  to 
prove  that  a  function  f(x)  is  uniformly  Lipschitz  a  over  intervals  la-(-e,  b-e[. 
If  a  =  I  and  the  wavelet  has  at  least  two  vanishing  moments,  the  class  of 
functions  that  satisfy  (3.3),  for  any  x  €  IH,  is  called  the  Zygmund  class. 
This  class  of  functions  is  larger  than  the  set  of  functions  that  are  uniformly 
Lipschitz  1.  For  example,  x  log(x)  belongs  to  the  Zygmund  class  although 
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it  is  not  Lipschitz  1  at  x  =  0.  The  reader  is  referred  to  Meyer's  book  [20]  for 
more  detailed  explanations  on  the  Zygmund  class. 

Theorem  3.3  gives  a  characterization  of  the  Lipschitz  regularity  over 
intervals  but  not  at  a  point.  The  second  theorem  proved  by  Jaffard  [14] 
shows  that  one  can  also  estimate  the  Lipschitz  regularity  of  f(x)  precisely 
at  a  point  xq.  The  theorem  gives  a  necessary  condition  and  a  sufficient 
condition  but  not  a  necessary  and  sufficient  condition.  We  suppose  that 
i|)(x)  has  n  vanishing  moments,  is  n  times  continuously  differentiable,  and 
has  compact  support.  Similar  theorems  on  point-wise  derivability  have  also 
been  proved  by  Holschneider  and  Tchamitchian  [13]. 

Theorem  3.4.  Let  ixbe  a  positive  integer  and  a  $  n.  Let  f  (x)  €  L^  (93).  If  f  (x) 
is  Lipschitz  a  at  xo  ,  then  there  exists  a  constant  A  such  that  fo'  all  points  x 
in  a  neighborhood  of  xo  and  any  scale  s, 

|Wf(s,x)K  A(s“ +lx-xon.  (3.9) 

Conversely,  let  a  <  n  be  a  non-integer  value.  The  function  f  (x)  is  Lipschitz  a 
at  Xo,  if  the  two  following  conditions  hold. 

1)  There  exists  e  >  0  and  a  constant  A  such  that  for  all  points  x  in  a 
neighborhood  of  xo  and  any  scale  s 

|Wf(s,x)KAs\  (3.10) 

2)  There  exists  a  constant  B  such  that  for  all  points  x  in  a  neighborhood 
of  Xo  and  any  scale  s 

|Wf(s.x)l  $  B  •  (3.11) 

V  lloglx-xoll/ 


As  a  result  of  Theorem  3.3,  we  know  that  (3.10)  implies  that  f(x)  is 
uniformly  Lipschitz  e  in  some  neighborhood  of  xq.  The  value  e  can  be 
arbitrarily  small.  To  interpret  (3.9)  and  (3.11),  let  us  define  in  the  scale-space 
the  cone  of  points  (s,  x)  that  satisfy 

lx  -  Xol  $  s. 

For  (s,x)  inside  this  cone,  (3.9)  and  (3.11)  imply  that  when  s  goes  to  zero, 
lWf(s,x)l  =  0(s“).  Belowthiscone,  the  valueoflWf(s.x)lis  controlled  by  the 
distance  of  x  with  respect  to  xo,  but  the  necessary  and  sufficient  conditions 
have  different  upper  bounds.  Equation  (3.11)  means  that  for  (s,x)  below 
the  cone. 
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The  behavior  of  the  wavelet  transform  inside  a  cone  pointing  to  xo,  and 
below  this  cone,  are  two  components  that  must  often  be  treated  separately. 

Theorems  3.3  and  3.4  prove  that  the  wavelet  transform  is  particularly 
well  adapted  to  estimate  the  local  regularity  of  functions.  For  example, 
Holschneider  and  Tchamitchian  (13]  used  a  similar  result  to  analyze  the 
differentiability  of  the  Riemann-Weierstrass  function.  As  mentioned  in  the 
introduction,  we  often  want  to  detect  and  characterize  the  irregular  parts  of 
signals.  Many  interesting  physical  processes  yield  irregular  structures  that 
are  currently  being  studied  [2].  A  well  known  example  is  the  turbulence 
for  high  Reynolds  numbers  where  there  is  still  no  comprehensive  theory  to 
understand  the  nature  and  repartition  of  irregular  structures  [4].  In  signal 
processing,  singularities  often  carry  most  of  the  signal  information.  In  nu¬ 
merical  experiments,  it  is  however  difficult  to  apply  directly  Theorems  3.3 
and  3.4  in  order  to  detect  singularities  and  to  characterize  their  Lipschitz  ex¬ 
ponents.  Indeed,  these  theorems  impose  to  measure  the  decay  of  IVVf(s,  x) 
in  a  whole  two-dimensional  neighborhood  of  xo  in  the  scale-space  (s.xl, 
which  requires  a  lot  of  computation.  The  next  section  reviews  briefly  the 
different  techniques  that  have  been  used  to  numerically  detect  singularities 
with  a  wavelet  transform.  We  then  explain  how  singular  points  are  related 
to  the  wavelet  transform  local  maxima. 

4.  Detection  and  measurement  of  singularities 

The  measurement  of  the  wavelet  transform  decay,  in  a  whole  neighborhood 
of  a  point  xo  in  the  scale  space  (s,x),  is  numerically  expensive.  One  technique 
that  is  often  used  in  numerical  applications,  is  to  only  compute  the  decay 
of  lWf(s,x)i  at  a  fixed  abscissa  x  -  xo.  This  means  that  we  measure  the 
evolution  of  the  wavelet  transform  along  the  vertical  line  that  points  to  xo  in 
the  scale  space  (s,x).  Although  this  approach  can  provide  a  good  estimate 
of  the  local  Lipschitz  exponent  in  many  cases,  let  us  explain  through  a 
simple  counterexample  why  it  cannot  be  used  reliably.  We  suppose  that  the 
wavelet  vh(x)  is  symmetrical  with  respect  to  0  and  has  compact  support.  Let 
f(x)  -  0  for  x  <  xo  and  f(x)  --  1  for  x  xo-  We  can  derive  that  Wf(s,  x) 
xWx  -  Xo)/s),  where  x(x)  is  the  primitive  of  vplx)  with  compact  support. 
Since  rh(x)  is  symmetrical,  x(x|  is  antisymmetrical  and  hence  x(0|  =-  0.  We 
thus  derive  that  for  any  .s  0,Wf(s,xo)  0.  Sincex(x)  has  compact  support, 
for  any  x  /  x^,  there  exists  a  scale  s*  >  Osuch  that  if  s  <  Sx  then  Wf  (s,  x)  -  0. 
This  proves  that  along  each  vertical  line  in  the  scale-space  plane,  the  wavelet 
transform  is  uniformly  zero  for  scales  small  enough.  If  we  estimate  the  local 
Lipschitz  exponents  from  the  decay  of  the  wavelet  transform  along  vertical 
lines,  it  "looks  like"  the  function  f(x)  has  no  singularity  although  it  does 
have  a  discontinuity  at  xc  The  mistake  comes  from  the  fact  that  we  did 
not  measure  the  decay  of  the  wavelet  transform  inside  a  two-dimensional 
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neighborhood  of  xo,  as  is  required  by  theorems  3.3  and  3.4.  Similar  counter¬ 
examples  are  encountered  in  many  usual  signals.  The  function  sin(l/x)  is 
another  type  of  counter-example  which  is  studied  in  Section  5.3. 

In  his  pioneering  work  on  wavelets,  Grossmann  [10]  gives  an  approach 
to  detect  singularities  with  a  wavelet  which  is  a  Hardy  function.  A  Hardy 
function  g(x)  is  a  complex  function  whose  Fourier  transform  satisfies 

g  ( cu )  =  0  for  tu  <  0.  (4.1) 

Let  f  €  L-^(fH)  and  Wf(s,x)  be  the  complex  wavelet  transform  built  with  a 
Hardy  wavelet.  For  a  fixed  scale  s,  (2.3)  implies  that  the  Fourier  transform 
Wffs,  tu)  is  also  zero  at  negative  frequencies,  so  it  is  also  a  Hardy  function. 
Let  4)(s,x)  and  p(s,x)  be  respectively  the  argument  and  modulus  of  the 
complex  number  Wf(s,x).  The  argument  4)(s,x)  is  also  called  the  phase  of 
the  wavelet  transform.  Grossmann  [10]  indicates  that  in  the  neighborhood 
of  an  isolated  singularity  located  at  xo,  the  lines  in  the  scale-space  (s,x) 
where  the  phase  (J)(s,  x)  remains  constant,  converge  to  the  abscissa  xo,  when 
the  scale  s  goes  to  0.  One  can  use  this  observation  to  detect  singularities, 
but  the  phase  4)(s,x)  is  not  sufficient  to  measure  their  Lipschitz  regularity. 
Moreover,  the  value  of  ct)(s,x)  is  unstable  when  the  modulus  p(s,  x)  is  close  to 
zero.  It  is  thus  necessary  to  combine  the  modulus  and  the  phase  information 
to  characterize  the  different  singularities,  but  no  effective  method  has  been 
derived  yet. 

In  computer  vision,  it  is  extremely  important  to  detect  the  edges  that 
appear  in  images,  and  many  researchers  [25,  27,  18,  19,  6]  have  developed 
techniques  based  on  multiscale  transforms.  These  multiscale  transforms 
are  equivalent  to  a  wavelet  transform  but  have  been  studied  before  the 
development  of  the  wavelet  formalism.  Let  us  call  a  smoothing  function 
any  real  function  0(x)  such  that  0(x)  =  0(1/(1  +  x^))  and  whose  Fourier 
transform  satisfies  6(0)  0.  The  integral  ofa  smoothing  function  is  therefore 

nonzero.  A  smoothing  function  can  be  viewed  as  the  impulse  response  of  a 
low-pass  filter.  An  important  example  often  used  in  computer  vision  is  the 
Gaussian  function.  LetOJx)  =  1/s6(x/s).  Fdges  at  the  scale  s  are  defined  as 
local  sharp  variation  points  of  f(x)  smoothed  by  6s(x).  Let  us  explain  how 
to  detect  these  edges  with  a  wavelet  transform.  Let  U’'(x)  and  \J’'^(x)  be  the 
two  wavelets  defined  by 


4'' (x)  and  v|)'^(x) 

dx 


d^6(x| 

dx'^ 


(4.2) 


The  wavelet  transforms  defined  with  respect  to  each  of  these  wavelets  are 
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given  by; 

W'f(s,x)  =  f  *  \i)](x)  and  W^f(s,x)  =  f  *  il)5(x).  (4-3) 

W’f(s,x)  =  f  *  *  9s)(x)  (4.4) 

and 

W^f(s.x)  =f»  =  ,45) 

The  wavelet  transforms  W'f(s,x)  and  W^f(s,x)  are  proportional  to,  respec¬ 
tively,  the  first  and  second  derivative  of  f(x)  smoothed  by  0slx).  For  a  fixed 
scale  s,  the  local  extrema  of  W'f(s,x)  along  the  x  variable  correspond  to 
the  zero-crossings  of  W^f(s,x)  and  to  the  inflection  points  of  f  *  0s(x)  (see 
Figure  4.1). 

If  the  wavelet  Ui^lx)  is  continuously  differentiable,  the  wavelet  trans¬ 
form  W^f(s,x)  is  a  differentiable  surface  in  the  scale-space  plane.  Hence, 
the  zero-crossings  of  W^f(s,  x)  define  a  set  of  smooth  curves  that  often  look 
like  fingerprints  [27].  Let  us  prove  that  one  can  define  a  particular  Hardy 
wavelet  such  that  the  phase  of  the  wavelet  transform  remains  constant  or 
changes  sign  along  these  fingerprints. 

Let  4)  ’  ( X )  be  the  Hilbert  transform  ofvij^(x)and4^‘*(x)  =  4)‘^(x)  +  i4jMx). 
The  wavelet  4^‘*(x)  is  a  Hardy  wavelet.  Let  W‘’f(s,x)  -  f  *  4’s(x).  The  real 
part  of  VV'‘*f{s,x)  is  equal  to  W^f(s,x).  Hence,  the  phase  i|j(s,x)  is  equal  to 
n/2  or  -n/2  if  and  only  if  W-^fls.x)  =  0.  Since  W‘'f(s,x)  is  a  continuous 
function,  the  phase  4)(s,  x)  cannot  jump  from  n/2  to  -  n/2  along  a  connected 
line  in  the  scale  space,  unless  the  modulus  is  equal  to  0.  If  the  modulus  of 
W**  f  ( s ,  X )  is  equal  to  0,  the  phase  is  not  defined  and  it  can  change  sign  a  t  these 
points.  Similarly  to  lines  of  constant  phase,  the  zero-crossings  "fingerprints" 
indicate  the  locations  of  sharp  variation  points  and  singularities  but  do  not 
characterize  their  Lipschitz  regularity.  We  need  more  information  about  the 
decay  of  |W^f(s,  x)|,  in  the  neighborhood  of  these  zero-crossings  lines. 

Detecting  the  zero-crossings  of  W-^fjs.x)  or  the  local  extrema  of 
W'f(s,x)  are  similar  procedures  but  the  local  extrema  approach  has  several 
important  advantages.  An  inflection  point  of  f  ♦  0s(x)  can  either  be  a 
maximum  or  a  minimum  of  the  absolute  value  of  its  first  derivative.  As  in 
the  abscissa  xo  and  xz  of  Figure  4.1,  the  local  maxima  of  the  absolute  value 
of  the  first  derivative  are  sharp  variation  points  of  f  *  0s (x)  whereas  the 
minima  correspond  to  slow  variations  (abscissa  xi  ).  These  two  types  of 
inflection  points  can  be  distinguished  by  looking  whether  an  extremum  of 
I  |W’f(s,x)|  is  a  maximum  or  a  minimum  but  they  cannot  be  differentiated 

from  the  zero-crossings  of  W^f(s,x).  For  edge  or  singularity  detection, 
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we  are  only  interested  in  the  local  maxima  of  |W'f(s,x)|.  When  detecting 
the  local  maxima  of  |W'f(s,x)|,  we  can  also  keep  the  value  of  the  wavelet 
transform  at  the  corresponding  location.  With  the  results  of  theorems  3.3 
and  3.4,  we  prove  in  the  next  section  that  the  values  of  these  local  maxima 
often  characterize  the  Lipschitz  exponents  of  the  signal  irregularities. 


Figure  4.1:  The  extrema  of  W'ffs.x)  and  the  zero-crossings  of 
W^f(s,x)  are  the  the  inflection  points  of  f  *  0s(x).  The  points  of 
abscissa  xo  and  X2  are  sharp  variation*;  of  f  *  0s(x)  and  are  local 
maxima  of  x)!.  The  local  minimum  of  |W’f(s,x)i  in  xi  is  also 

an  inflection  point  but  it  is  a  slow  variation  point. 


5.  Wavelet  transform  local  maxima 

5.1.  General  properties 

By  supposing  that  the  wavelet  il)(x)  is  the  first  derivative  of  a  smoothing 
function,  we  impose  that  rl^(x)  has  only  one  vanishing  moment.  In  general, 
we  do  not  want  to  impose  only  one  vanishing  moment  because,  as  explained 
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in  Section  3,  then  we  cannot  estimate  Lipschitz  exponents  larger  than  1.  In 
this  section,  we  study  the  mathematical  properties  of  the  wavelet  local  max¬ 
ima  and  explain  how  to  measure  Lipschitz  exponents.  Let  us  first  precisely 
define  what  we  mean  by  local  maximum. 

Definition  5.1.  Let  Wf(s, x)  be  the  wavelet  transform  ofa  function  f(x). 

■  We  call  local  extremum,  any  point  (so.xo)  such  that  '  has  a 

zero-crossing  at  x  =  xo,  when  x  varies. 

■  We  call  local  maximum,  any  point  (se.xo!  such  that  iVVt(so.xi  •- 
iWflso.xoll  when  x  belongs  to  either  a  right  or  the  left  neighborhood 
of  Xo,  and  |\Vf(so.  x)!  ^  |Wf(so.  xo )!  when  x  belongs  to  the  other  side  ol 
the  neighborhood  of  xo. 

.  We  call  maxima  line,  any  connected  curve  in  the  scale  space  (.s,  \  1  aii'ng 
which  all  points  are  local  maxima. 

A  local  maximum  (.so,  xo  I  of  the  wavelet  transform  is  strictly  maximuni 
cither  on  the  right  or  the  left  side  ot  the  xo-  To  speak  ol  local  maximum  ol 
the  wavelet  transform  is  an  abuse  of  language  since  we  really  mean  a  local 
maxima  of  the  wavelet  transform  modulus,  but  it  simplifies  the  explanations. 
The  first  theorem  proves  that  if  the  wavelet  transform  has  no  maximum  in  a 
neighborhood,  then  the  function  is  uniformly  l.ip.schitz  a;,  for  o  •  >i. 

Theorem  5.2.  Let  n  be  a  strictly  positive  integer.  Let  ilMx'  be  a  wavelet 
with  compact  support,  n  vanishing  moments  and  n  times  continuously 
differentiable.  Let  f(xl  L. '(.a.b!). 

■  If  there  exists  a  scale  s,'  ■  0  such  that  for  all  scales  s  ■  se  and  ,  n,b  , 
Wfl.s.x)  has  no  local  maxima,  then  for  any  c  •  0  and  a  •  d,  il\i  is 
unitormly  Lipschitz  a  on  la  i  c,b  c  . 

■  If  lidx)  is  the  n"'  derivative  ot  a  smoothing  function,  then  f|x  i  is  uni¬ 
formly  L  ipschitz  n  on  any  such  interval  ;a  *  c,  b  c  . 

The  proof  of  this  theorem  is  in  Appendix  A.  In  the  lolknving,  w  e  sup¬ 
pose  that  vl>(x)  is  the  n'*'  derivative  of  a  smoothing  function.  In  this  case 
we  can  prine  that  the  function  is  locally  Lipschitz  a  tor  the  integer  \  alue 
a  n  liecause  the  wavelet  v|i(x)  has  no  more  than  u  vanishing  moments. 
Theorem  3.2  implies  that  on  the  intervals  la  +  e,b  -  cf,  f(x)  has  no  singu¬ 
larity.  Indeed,  singularities  were  defined  as  points  where  the  tunction  is 
not  Lipschitz  1.  Let  us  define  the  closure  of  the  wavelet  translorm  maxima 
of  f(x)  as  the  set  of  points  xo  such  that  for  any  c  >  0  and  scale  ■  0, 
there  exists  a  wavelet  transform  local  maxima  at  a  point  (si ,  xi )  that  satisfy 
|x)  -  xcl  <  c  and  s  i  <  .so.  This  closure  is  the  set  of  points  on  the  real  line  that 
are  arbitrarily  close  to  some  local  maxima  in  the  scale-space  (s,  x). 


L 
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Corollary  to  5.2.  ''  he  closure  of  the  set  of  points  where  f  (x  j  is  not  Lipschitz  n 
is  included  in  the  closure  of  the  wavelet  transform  maxima  of  f  (x). 

This  corollary  is  a  straightforward  implication  of  Theorem  5.2.  It  proves 
that  all  singularities  of  f(x)  can  be  located  by  following  the  maxima  lines 
when  the  scale  goes  to  zero.  It  is  however  not  true  that  the  closure  of  the 
points  where  f(x)  is  not  Lipschitz  n  is  equal  to  the  closure  of  the  wavelet 
transform  maxima.  Equation  (5.10)  proves  for  example  that  if  ij)(x)  is  anti- 
symmetrical  then  for  f(x)  =  sin(x),  all  the  points  pn,  p  g  2.,  belong  to  the 
closure  of  the  wavelet  local  maxima,  although  sin(x)  is  infinitely  continu¬ 
ously  differentiable  at  these  points.  Let  us  now  study  how  to  use  the  value  of 
the  wavelet  transform  maxima  in  order  to  estimate  the  Lipschitz  regularity  of 
f|x)  at  the  points  that  belong  to  the  closure  of  the  wavelet  transform  maxima. 

5.2.  Non-oscillating  singularities 

In  this  section,  westudy  the  characterization  of  singularities  when  locally  the 
function  has  no  oscillations.  The  next  section  explains  the  potential  impact 
of  oscillations.  We  suppose  that  the  wavelet  il'(x)  has  compact  support,  is 
n  times  continuously  differentiable  and  is  the  n"'  derivative  of  a  smoothing 
function.  The  following  theorem  characterizes  a  particular  class  of  isolated 
singularities  from  the  behavior  of  the  wavelet  transform  local  maxima. 

Theorem  5.3.  Let  f(xl  be  a  tempered  distribution  whose  wavelet  transform 
is  well  defined  over  la.bf  and  let  xo  g  la.bL  We  suppose  that  there  exists 
a  scale  so  >  0  and  a  constant  C  such  that  for  x  g  la.bf  and  s  <  sc,  all  the 
maxima  of  Wf(s,  x)  belong  to  a  cone  defined  by 

lx  -  xol  sS  Cs.  (5.1 1 

Then,  at  all  points  \\  g  'a.bi,  xt  t  xo,  f(x)  is  uniformly  Lipschitz  n  in 
a  neighborhood  of  X|.  Let  «  n  be  a  non-integer.  The  function  f(x)  is 
Lipschitz  a  at  xo  if  and  only  if  there  exists  a  constant  A  such  that  each  local 
maxima  (s,  x)  in  the  cone  defined  by  (5.1 )  satisfies 

IWf(s,x)i  <  As“.  (5.2) 

The  proof  of  this  theorem  is  given  in  Appendix  B.  Equation  (5.2)  is 
equivalent  to 

log|Wf(s,x)|  $  log(A)  +  alog(s).  (5.3) 

If  the  wavelet  transform  maxima  satisfy  the  cone  distribution  imposed  by 
Theorem  5.3,  (5.3)  proves  that  the  Lipschitz  regularity  at  xo  is  the  maximum 
slope  of  straight  lines  that  remain  above  log  |Wf(s,x)|,on  a  logarithmic  scale. 
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The  fact  that  all  local  maxima  remain  in  a  cone  that  points  to  xo  implies  that 
f(x)  is  Lipschitz  n  at  all  points  x  €  lQ,b[,  ^  xo-  Figures  5.2a  through  5.2e 
show  the  wavelet  transform  of  a  function  with  isolated  singularities  that 
verify  the  cone  localization  hypothesis.  To  compute  this  wavelet  transform 
we  used  a  wavelet  with  only  1  vanishing  moment.  The  graphs  of  rl^(x)  and 
its  primitive  0(x)  are  shown  in  Figures  5.1a  and  5.1b.  The  Fourier  transform 
of  ii)(x)  is 

^  .  /sin(cu/4)^'‘ 

This  wavelet  belongs  to  a  class  for  which  the  wavelet  transform  can  be 
computed  with  a  fast  algorithm  [28]. 


Figure  5.1a:  Graph  a  wavelet  \]>(x)  with  compact  support  and  one 
vanishing  moment.  It  is  a  quadratic  spline. 

In  numerical  computations,  the  input  function  is  not  known  at  all  ab¬ 
scissa  X  but  is  characterized  by  a  uniform  sampling  which  approximates  f  (x) 
at  a  resolution  that  depends  upon  the  sampling  interval  [16].  These  samples 
are  generally  the  result  of  a  low-pass  filtering  of  f  (x)  followed  by  a  uniform 
sampling.  If  we  suppose  for  normalization  purpose  that  the  resolution  is  1, 
then  we  can  compute  the  wavelet  transform  of  f  (x)  only  at  scales  larger  than 
1 .  When  a  function  is  approximated  at  a  finite  resolution,  strictly  speaking,  it 
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Figure  5.1b:  Graph  of  the  primitive  0(x)  with  compact  support. 


Figure  5.2a:  In  the  left  neighborhood  of  the  abscissa  0.16,  the  signal 
locally  behaves  like  1  +  (0.16  -  x)^  ^  whereas  in  the  right  neigh¬ 
borhood  it  behaves  like  1  f  (x  —  0.16)^  At  the  abscissa  0.44  the 
signal  has  a  discrete  Dirac  (Lipschitz  regularity  equal  to  —  I).  At  0.7, 
the  Lipschitz  regularity  is  1 .5  and  at  the  abscissa  0.88  the  signal  is 
discontinuous. 


is  not  meaningful  to  speak  about  singularities,  discontinuities  and  L.ipschitz 
exponents.  This  is  illustrated  by  the  fact  that  we  cannot  compute  the  asymp¬ 
totic  decay  of  the  wavelet  transform  amplitude  since  we  cannot  compute  the 
wavelet  transform  at  scales  smaller  than  1 .  In  practice,  we  still  want  to  use  the 
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Figure  5.2b;  Wavelet  transform  between  the  scales  1  and  2®  com¬ 
puted  with  the  wavelet  shown  in  Figure  5.1a.  The  finer  scales  are  at 
the  top  and  the  scale  varies  linearly  along  the  vertical.  Black,  grey 
and  white  points  indicate  that  the  wavelet  transform  has  respectively 
negative,  zero  and  positive  values. 


-4X 


Figure  5.2c:  Each  black  point  indicates  the  position  of  a  local  maxi¬ 
mum  in  the  wavelet  transform  shown  in  Figure  5.2b  The  singularity 
of  the  derivative  cannot  be  detected  at  the  abscissa  0.7  because  the 
wavelet  has  only  one  v^anishing  moment. 


mathematical  tools  that  describe  singularities,  even  though  we  are  limited 
by  the  resolution  of  measurements.  Suppose  that  the  approximation  of  f  (x) 
at  the  resolution  1  is  given  by  a  set  of  samples  (fn)n6t  with  f„  =  0  for  n  <  no 
and  fn  =  1  for  n  ^  no,  like  at  the  abscissa  0.88  of  Figure  5.2a.  We  would 
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Figure  5.2d:  Local  maxima  of  the  wavelet  transform  of  the  signal  in 
Figure  5.2a,  computed  with  a  wavelet  with  two  vanishing  moments. 
The  number  of  maxima  line  increases.  The  singularity  of  the  deriva¬ 
tive  at  0.7  can  now  be  detected  from  the  decay  of  the  wavelet  local 
maxima. 


like  to  say  that  at  the  resolution  1,  f(x)  behaves  as  if  it  has  a  discontinuity  at 
n.  =  no  although  it  is  possible  that  f(x)  is  continuous  at  no  but  has  a  sharp 
transition  at  that  point  which  is  not  visible  at  the  resolution  1.  The  charac¬ 
terization  of  singularities  from  the  decay  of  the  wavelet  transform  enables 
us  to  give  a  precise  meaning  to  this  discontinuity  at  the  resolution  1.  Since 
we  cannot  measure  the  asymptotic  decay  of  the  wavelet  transform  when  the 
scale  goes  to  0,  we  measure  the  decay  of  the  wavelet  transform  up  to  the 
finer  scale  available.  The  Lipschitz  exponents  are  computed  by  finding  the 
coefficient  a  such  that  As"  approximates  at  best  the  decay  of  IVVfls,  x)|  over 
a  given  range  of  scales  larger  than  1  (see  Figure  5.2b).  With  this  approach, 
we  can  use  Lipschitz  exponents  to  characterize  the  irregularities  of  discrete 
signals.  In  Figure  5.2b,  the  discontinuity  appears  clearly  from  the  fact  that 
|Wf(s,x)|  remains  approximatively  constant  over  a  large  range  of  scales,  in 
the  neighborhood  of  the  abscissa  0.88.  Negative  Lipschitz  exponents  corre¬ 
spond  to  sharp  irregularities  where  the  wavelet  transform  modulus  increases 
at  fine  scales,  A  sequence  (fnlnet  with  fn  =  0  for  n  7^  no,  and  fno  =  L 
can  be  viewed  as  the  approximation  of  a  Dirac  at  the  resolution  1.  At  the 
abscissa  0.44,  the  signal  of  Figure  5.2a  has  such  a  discrete  Dirac.  The  wavelet 
transform  maxima  increase  proportionally  to  s~ '  over  a  large  range  of  scales, 
in  the  corresponding  neighborhood.  In  the  rest  of  this  paper,  we  suppose 
that  all  numerical  experiments  are  performed  on  functions  approximated  at 
the  resolution  1  and  we  consider  that  the  decay  of  the  wavelet  transform 
at  scales  larger  than  1  characterize  the  Lipschitz  exponent  of  the  function 


{  Mallat,  Hwang 


66  } 


log{\Wf{s,x)\) 


Figure  5.2e:  Decay  of  log2  |Wf(s,x)|  as  a  function  of  log2(s)  along 
the  two  maxima  lines  that  converge  to  the  point  of  abscissa  0.16, 
computed  with  the  wavelet  of  Figure  5.1a.  The  two  different  slopes 
show  that  the  f(x)  has  a  different  singular  behavior  in  the  left  and 
right  neighborhood  of  0. 1 6and  we  can  distinguish  the  two  exponents 
0.2  and  0.6. 


up  to  the  resolution  1.  Fast  algorithms  to  compute  the  wavelet  transform 
are  described  in  [16, 12].  We  shall  not  worry  anymore  about  the  opposition 
between  asymptotic  measurements  and  finite  resolution. 

The  local  maxima  of  the  wavelet  transform  of  Figure  5.2b  are  shown 
in  Figure  5.2c.  The  black  lines  indicate  the  position  of  the  local  maxima  in 
the  scale-space.  Figure  5.2e  gives  the  value  of  log2  |Wf(s,x)|  as  a  function 
of  log2(s)  along  each  of  the  two  maxima  line  that  converge  to  the  point  of 
abscissa  0.16,  between  the  scales  2'  and  2*.  It  is  interesting  to  observe  that 
at  fine  scales,  the  slopes  of  these  two  maxima  lines  are  different  and  are 
approximatively  equal  to  0.2  and  0.6.  This  shows  that  f  (x)  behaves  like  a 
function  Lipschitz  0.2  in  its  left  neighborhood  and  a  function  Lipschitz  0.6  in 
its  right  neighborhood.  The  Lipschitz  regularity  of  f(x)  at  0.16  is  0.2  which 
is  the  smallest  slope  of  the  two  maxima  lines. 

At  this  point  one  might  wonder  how  to  choose  the  number  of  vanish¬ 
ing  moments  to  analyze  a  particular  class  of  signals.  If  we  want  to  estimate 
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the  Lipschitz  exponents  up  to  a  maximum  value  n,  we  know  that  we  need  a 
wavelet  with  at  least  n  vanishing  moments.  In  Figure  5.2c,  there  is  one  max¬ 
ima  line  converging  to  the  abscissa  0.7  along  which  the  decay  of  loglWf  (s,  x)| 
is  proportional  to  log(s).  The  signal  was  built  from  a  function  whose  deriva¬ 
tive  is  singular  but  this  cannot  be  detected  from  the  slope  of  log  |Wf(s,  x)| 
because  the  wavelet  has  only  one  vanishing  moment.  Figure  5.2d  shows  the 
maxima  line  obtained  from  a  wavelet  which  has  two  vanishing  moments. 
The  decay  of  the  wavelet  transform  along  the  two  maxima  lines  that  converge 
to  the  abscissa  0.7  indicates  that  f(x)  is  Lipschitz  1.5  at  this  location.  Using 
wavelets  with  more  vanishing  moments  has  the  advantage  of  being  able  to 
measure  the  Lipschitz  regularity  up  to  a  higher  order  but  it  also  increases  the 
number  of  maxima  lines  as  can  be  observed  by  comparing  Figure  5.2c  and 
Figure  5.2d.  Let  us  prove  this  last  observation.  A  wavelet  rjjfx)  with  n  +  1 
vanishings  moment  is  the  derivative  of  a  wavelet  il)’(x)  with  n  vanishing 
moments.  Similarly  to  (4.4),  we  obtain 

Wf(s,x)  =  s  — (f*rt)’)(x)  =s— W'f(s.x).  (5.5) 

dx  ox 

The  wavelet  transform  of  f(x)  defined  with  respect  to  vl)(x)  is  proportional 
to  the  derivative  of  the  wavelet  transform  of  f(x)  with  respect  to  v(j'(x). 
Hence,  the  number  of  local  maxima  of  |Wf(s,x)|  is  always  larger  than  the 
number  of  local  maxima  of  lW'f(s,x)|.  The  number  of  maxima  at  a  given 
scale  often  increases  linearly  with  the  number  of  moments  of  the  wavelet. 
In  order  to  minimize  the  amount  of  computations,  we  want  to  have  the 
minimum  number  of  maxima  necessary  to  detect  the  interesting  irregular 
behavior  of  the  signal.  This  means  that  we  must  choose  a  wavelet  with 
as  few  vanishing  moments  as  possible  but  with  enough  moments  to  detect 
the  Lipschitz  exponents  of  highest  order  that  we  are  interested  in.  Another 
related  property  that  influences  the  number  of  local  maxima  is  the  number  of 
oscillations  of  the  wavelet  xKx).  For  most  types  of  singularities,  the  number 
of  maxima  lines  converging  to  the  singularity  depends  upon  the  number  of 
local  extrema  of  the  wavelet  itself.  A  Dirac  6(x)  gives  a  simple  verification 
of  this  property  since  W6(s,x)  =  1/svKx/s).  A  wavelet  with  n  vanishing 
moments  has  at  least  n  -I-  1  local  maxima.  In  numerical  computations,  it 
is  better  to  choose  a  wavelet  with  exactly  n  -(-  1  local  maxima.  In  image 
processing,  we  often  want  to  detect  discontinuities  and  peaks  which  have 
Lipschitz  exponents  smaller  than  I .  It  is  therefore  sufficient  to  use  a  wavelet 
with  only  one  vanishing  moment.  In  signals  obtained  from  turbulent  fluids, 
interesting  structures  have  a  Lipschitz  exponent  between  0  and  2  [3].  We  thus 
need  a  wavelet  with  two  vanishing  moments  to  analyze  turbulent  structures. 

Let  us  suppose  that  the  wavelet  vl;(x)  has  a  symmetrical  support  equal 
to  (-K,  KI.  We  call  the  cone  of  influence  of  xo  in  the  scale-space  plane  the  set 
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of  points  (s,x)  that  satisfy 

|x  -  xol  $  Ks. 

It  is  the  set  of  point  (s,x)  for  which  Wf(s,x)  is  influenced  by  the  value  of 
f(xj  at  xq.  In  order  to  characterize  the  regularity  of  f{x)  at  a  point  xo,  one 
might  think  that  it  is  sufficient  to  measure  the  decay  of  the  wavelet  transform 
within  the  cone  of  influence  of  xo-  Theorem  3.4  proves  that  this  is  wrong  in 
general  and  that  one  must  also  measure  the  decay  of  the  wavelet  transform 
below  this  cone  of  influence.  This  is  due  to  oscillations  that  can  create  a 
singularity  at  xo.  The  next  theorem  shows  that  if  we  suppose  that  f(x)  has 
no  such  oscillations,  then  the  regularity  of  f  (x)  at  a  point  xo  is  characterized 
by  the  behavior  of  its  wavelet  transform  along  any  line  that  belongs  to  a 
cone  strictly  smaller  than  the  cone  of  influence.  Section  5.3  explains  why 
this  property  is  wrong  when  f(x)  oscillates  too  much.  In  the  following  we 
suppose  that  ii)(x)  is  a  wavelet  which  is  n  times  continuously  differentiable, 
has  a  support  equal  to  (— K,  K],  and  is  equal  to  the  n"'  derivative  of  a  function 
0(x).  We  also  impose  that  0(x)  is  strictly  positive  on  the  interval  1  -  K,  K[. 

Theorem  5.4.  Let  xo  €  f{x)  €  We  suppose  that  there  exists  an 

interval  lo.bf,  with  xo  €  la,b[,  and  a  scale  so  >  0  such  that  the  wavelet 
transform  Wf(s,x)  has  a  constant  sign  for  s  <  sc  and  x  €  la.bf.  Let  us 
also  suppose  that  there  exists  a  constant  B  and  e  >  0  such  that  for  all  points 
X  e  la,  bf  and  any  scale  s 

|Wf(s.x)|  $  Bs‘.  (5.6) 

Let  X  =  X(s)  be  a  curve  in  the  scale  space  (s,x)  such  that  |xp  -  X(s)|  $  Cs, 
with  C  <  K.  It  there  exists  a  constant  A  such  that  for  any  scale  s  sp,  the 
wavelet  transform  satisfies 

|Wf(s,X(s))|  As'^  with  0  $  Y  ^  ^  <  (5-7) 

then  f(x)  is  Lipschitz  a  at  xp,  for  any  a  <  y. 

The  proof  of  this  theorem  is  in  Appendix  C.  One  can  easily  prove 
that  the  sign  constraint  over  the  wavelet  transform  of  f(x)  is  equivalent  to 
imposing  that  the  n'*"  derivative  of  f(x)  is  a  distribution  whose  restriction  to 
lo.bf  has  a  constant  sign.  Theorem  5.4  shows  that  the  regularity  of  f(x)  is 
controlled  by  the  behaviour  of  its  wavelet  transform  in  the  cone  of  influence, 
if  its  n"’  derivative  does  not  have  an  oscillatory  behavior  that  accelerates  in 
the  neighborhood  of  xp.  A  similar  theorem  can  be  obtained  if  we  suppose 
that  the  n***  derivative  of  f(x)  has  a  constant  sign  over  |q,  xpf  and  ]xp,  b[  but 
changes  sign  at  xp.  This  means  that  in  the  neighborhood  of  xp,  Wf(s,x)  has 
only  one  zero-crossing  at  any  fixed  scale  s  which  is  small  enough.  When  s 
goes  to  zero,  the  zero-crossing  curve  converges  to  the  abscissa  xp .  In  this  case, 
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we  need  to  control  the  decay  of  the  wavelet  transform  along  two  lines  that 
remain  respectively  in  the  left  and  the  right  part  of  the  cone  of  influence  of  xo  ■ 
From  Theorem  5.4,  one  can  compute  the  Lipschitz  regularity  of  non¬ 
isolated  singularities  from  the  behavior  of  the  wavelet  transform  maxima. 
We  test  whether  the  'avelet  transform  has  a  cot«tant  sign  in  the  neighbor¬ 
hood  of  xc  by  testing  the  sign  of  the  wavelet  transform  local  maxima.  It  is 
also  sufficient  to  verify  (5.6)  along  the  lines  of  maxima  in  the  neighborhood 
of  Xo.  The  Lipschitz  regularity  of  f(x)  at  xo  is  computed  from  the  decay  of 
the  wavelet  transform  along  one  line  of  maxima  that  converges  towards  xo- 
Let  us  emphasize  again  that  if  at  each  scale  the  wavelet  transform  has  only 
one  zero-crossing  in  a  neighborhood  of  xo.  Theorem  5.4  can  be  extended 
by  measuring  the  decay  of  the  wavelet  transform  along  two  curves  that  are 
respectively  in  the  left  and  the  right  parts  of  the  cone  of  influence  of  \o- 

A  "devil  staircase"  is  an  interesting  example  to  illustrate  the  application 
of  Theorem  5.4  to  the  detection  of  non-isolated  singularities.  The  derivative 
of  a  devil  staircase  is  a  Cantor  measure.  For  the  devil  staircase  shown  in 
Figure  5.4a,  the  Cantor  measure  is  built  recursively  as  follow.  For  p  =  0, 
the  support  of  the  measure  go  is  the  interval  [0,1],  and  it  has  a  uniform 
density  equal  to  1  on  [0, 1].  The  measure  g,,  is  defined  by  subdividing  each 
domain  where  gp_i  has  a  uniform  density  equal  to  a  constant  c  >  0,  into 
three  domains  whose  respective  sizes  are  1/5,  2/5  and  2/5.  The  density  of 
the  measure  g,,  is  equal  to  0  in  the  central  part,  to  c/3  in  the  first  part  and 
to  2c/3  in  last  part  (see  Figure  5.3).  One  can  verify  that  g,,  (dx)  =  1.  The 
limit  measure  g^  obtained  with  this  iterative  process  is  a  Cantor  measure. 
The  devil  staircase  is  defined  by: 


f(x) 


*X 

0 


goo(dx). 


Figure  5.4a  shows  the  graph  of  a  devil  staircase  and  Figure  5.4b  its  wavelet 
transform  computed  with  the  wavelet  of  Figure  5.1a.  For  a  devil  staircase, 
we  can  prove  that  the  maxima  lines  converge  exactly  to  the  points  where  the 
function  f(x)  is  singular.  There  is  no  maxima  line  that  converges  to  a  point 
where  the  function  is  not  singular. 

Proof.  By  definition,  the  set  of  points  where  me  maxima  lines  converge  is 
the  closure  of  the  wavelet  transform  maxima,  and  the  Corollary  to  5.2  proves 
that  it  includes  the  closure  of  the  points  where  f(x)  is  singular.  For  a  devil 
staircase,  the  support  of  the  points  where  f(x)  is  singular  is  equal  to  the 
support  of  the  Cantor  measure,  which  is  a  closed  set.  It  is  thus  equal  to  its 
closure.  For  any  point  xo  outside  this  closed  set,  we  can  find  a  neighborhood 
Ixo  -  e,  Xo  +  e(  which  does  not  intersect  the  support  of  goo(x).  On  this  open 
interval,  f(x)  is  constant  so  for  s  small  enough  and  x  e  Ixo  -  e/2,  xo  +  e/2(, 
Wf  (s,  x)  is  equal  to  zero.  The  point  xo  therefore  cannot  belong  to  the  closure 
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of  the  wavelet  transform  maxima.  This  proves  that  the  closure  of  the  wavelet 
transform  maxima  is  included  in  the  singular  support  of  f(x).  Since  the 
opposite  is  also  true,  it  implies  that  both  sets  are  equal.  | 

For  the  particular  devil  staircase  that  we  defined,  the  Lipschitz  regu¬ 
larity  of  each  singular  point  depends  upon  the  location  of  the  point.  One 
can  prove  [3]  that  at  all  locations,  Lipschitz  exponent  a  satisfies 

log(2/3)  ^  log(l/3) 
log(2/5)  ^  "logd/S)- 

Hence,  (5.6)  of  Theorem  5.4  is  verified  for  e  <  log(2/3)/log(2/5).  Since  a 
devil  staircase  is  monotonically  increasing  and  our  wavelet  is  the  derivative 
of  a  positive  function,  the  wavelet  transform  remains  positive.  Theorem  5.4 
proves  that  the  local  Lipschitz  regularity  of  f(x)  at  any  singular  point  can  be 
estimated  from  the  decay  of  the  wavelet  transform  along  the  maxima  line  that 
converges  to  that  point.  Figure  5.4c  shows  the  position  of  the  maxima  lines  in 
the  scale-space.  The  renormalization  properties  of  the  Cantor  set  appear  as 
renormalization  properties  of  the  graph  of  maxima  lines.  Muzy,  Bacry  and 
Arneodo  [23]  have  shown  that  one  can  precisely  compute  the  singularity 
spectrum  f(a)  of  multifractal  signals  from  the  evolution  across  scales  of  the 
wavelet  transform  local  maxima.  These  results  are  particularly  interesting 
for  studying  irregular  physical  phenomena  such  as  turbulences  [23]. 


0  1 
p  -  0 - - - - - — - 

p  -  I  - 2^5 - 


p  -  2-  - 

p  -  3--  -  _ 


Figure  5.3:  Recursive  operation  for  building  a  multifractal  Cantor 
measure.  The  Cantor  measure  is  obtained  at  the  limit  of  this  iterative 
procedure. 


5.3.  Singularities  with  fast  oscillations 

If  the  function  f  (x)  is  oscillating  quickly  in  the  neighborhood  of  xc,  then  one 
cannot  characterize  the  Lipschitz  regularity  of  f(x)  from  the  behavior  of  its 
wavelet  transform  in  the  cone  of  influence  of  xo-  We  say  that  a  function  f(x) 
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Figure  5.4b:  Wavelet  transform  of  the  devil  staircase  computed  with 
the  wavelet  of  Figure  5.1a.  Black  and  white  points  indicate  respec¬ 
tively  that  the  wavelet  transform  is  zero  or  strictly  positive. 


has  fast  oscillations  at  xo  if  and  only  if  there  exists  a  >  Osuch  that  f(x)  is  not 
Lipschitz  aat  xc  but  its  primitive  is  Lipschitz  a+  I  atxc.  This  situation  occurs 
when  f(x)  is  a  function  which  oscillates  very  quickly  and  whose  singularity 
behavior  at  xc  is  dominated  by  these  oscillations.  The  integral  of  f(x)  av¬ 
erages  f|x)  so  the  oscillations  are  attenuated  and  the  Lipschitz  exponent  at 
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Figure  5.4c:  Local  maxima  of  the  wavelet  transform  shown  in  Fig¬ 
ure  5.4b. 


xo  increases  by  more  than  1.  Singularities  with  such  an  oscillatory  behavior 
have  been  thoroughly  studied  in  mathematics  [29].  A  classical  example  is 
the  function  f(x)  =  sin(l/x)  in  the  neighborhood  of  x  =  0.  This  function  is 
not  continuous  at  0  but  is  bounded  in  the  neighborhood  of  0  so  its  Lipschitz 
regularity  is  equal  to  0  at  x  =  0.  Let  g(x)  be  a  primitive  of  sin(l/x),  one 
can  easily  prove  that  |g(x)  -  g(0)|  =  O(x^)  in  the  neighborhood  of  x  =  0, 
so  g(x)  is  Lipschitz  2  at  this  point.  By  computing  the  primitive  of  f(x),  we 
increase  the  Lipschitz  exponent  by  2  because  the  oscillations  of  sin(l/x)  are 
attenuated  by  the  averaging  effect. 

Let  f(x)  be  a  function  with  fast  oscillations  at  xo  and  let  g(x)  be  its 
primitive.  Let  (x)  be  the  derivative  of  lii(x).  Since  g(x)  is  Lipschitz  oc  +  1, 
the  necessary  condition  (3.9)  of  Theorem  3.4  implies  that  in  a  neighborhood 
of  Xo,  the  wavelet  transform  defined  with  respect  to  4>'  (x)  satisfies 

|W'g(s.x)|  ^  A(s‘"  +|x-Xor  ")•  (5-8) 

Similarly  to  (4.4)  we  can  prove  that 

W'g(s,x)  =  g  ♦  i|^'(x)  =  s(f  *  ijjjKx)  =  sWf(s,x). 

We  thus  derive  that 

|Wf(s.x)|^  A(s^  +||x-xol‘"). 


(5.9) 
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This  equation  proves  that  although  f(x)  is  not  Lipschitz  a,  in  the  cone  of 
influence  of  Xo  |Wf(s,x)|  =  The  fact  that  f(x)  is  not  Lipschitz  a  cannot 

be  detected  from  the  decay  of  |Wf(s,x)|  inside  the  cone  of  influence  of  xo, 
but  by  looking  at  its  decay  below  the  cone  of  influence,  as  a  function  of 
|x  -  xcl-  Since  f(x)  is  not  Lipschitz  a,  the  necessary  condition  (3.9)  implies 
that  for  (s  x)  below  the  cone  of  influence  of  xo,  the  wavelet  transform  does 
not  satisfy  tWf(s,x)|  =  0(|x  -  xol‘ ).  When  a  function  has  fast  oscillations,  its 
worst  singular  behavior  at  a  point  xo  is  observed  below  the  cone  of  influence 
of  xc  in  the  scale-space  plane. 

Let  us  study  in  more  detail  the  case  of  f(x)  =  sin(1/xj.  Since  the 
primitive  is  Lipschitz  2,  we  can  take  a  =  1.  Equation  (5.9)  implies  that  in 
the  cone  of  influence  of  0,  the  wavelet  transform  satisfies  lWf(s,x)|  =  0(,s). 
Figure  5.4e  shows  the  wavelet  transform  of  sin(  1/x).  It  has  a  high  amplitude 
along  a  curve  in  the  scale  space  (s,x)  which  reaches  (0.0)  below  the  cone  ot 
influence  of  0.  It  is  along  this  path  in  the  scale-space  that  the  singular  part 
of  f(x)  reaches  0.  Let  us  interpret  this  curve  and  prove  that  it  is  a  parabola. 
Through  this  analysis  we  derive  a  procedure  to  estimate  locally  the  size  of 
the  oscillations  of  f(x). 

The  function  f(x)  =  sin(l/x)  can  be  written  f(x)  -  sin(iL\x),  where 
=  1,'x^  can  be  viewed  as  an  "instantaneous"  frequency.  Let  us  compute 
the  wavelet  transform  of  a  sinusoidal  wave  of  constant  frequency  If 
we  suppose  that  the  wavelet  v|j(x)  is  antisymmetrical,  as  it  is  the  case  in  our 
numerical  computations,  from  (2.3)  we  derive  that  the  wavelet  transform  of 
h(x)  -•  sin (a'ox)  satisfies 

IWh(s,x)i  ---  ;'cos(a'ox)|ij/(scue )|.  (.5.10) 

For  a  symmetrica!  wavelet,  the  cosine  is  replace  by  a  sine  in  the  right-hand 
side  of  this  equation.  For  a  fixed  abscissa  x,  the  decay  of  V\'T-i(s,\)  as  a 
function  of  s  is  proportional  to  the  decay  of  Ivj'lsceoli.  If  ;t|'(ie)!  reaches  its 
maxima  at  to  ^  to,,,,  then  for  x  fixed,  (Whl.s,  xjiis  maximum  at  So  --  u',,,  u’o. 
The  scale  where  iWh(,s,x)i  is  maximum  is  inversely  proportional  to  the  fre¬ 
quency  of  the  sinusoidal  wave.  The  value  of  Wh(s,x)  depends  on  the  values 
of  h(x)  in  a  neighborhood  of  size  proportional  to  the  scale  s,  so  the  fre¬ 
quency  measurement  is  local.  Since  f(x)  -  sin(l/x)  has  an  instantaneous 
frequency  i.l\  I/x^,  for  a  fixed  abscissa  x,  |Wf(s,x)|  is  globally  maximum 
for  s  c,n  /ci’x  e,„x^.  This  is  why  we  see  in  Figure  5.4e  that  the  wavelet 
transform  has  a  maximum  amplitude  along  a  parabola  that  converges  to 
the  abscissa  0  in  the  scale-space.  This  "instantaneous"  frequency  measure¬ 
ment  is  based  on  an  idea  that  has  been  developed  previously  by  Escudie 
and  Torresani  [9)  for  measuring  the  modulation  law  of  asymptotic  signals. 
The  results  of  Escudie  and  Torresani  have  also  been  refined  by  Delprat  et 
al.  18),  who  explain  how  to  precisely  extract  the  amplitude  and  frequency 
modulation  laws  from  a  complex  wavelet  transform. 
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Let  us  now  study  the  behavior  of  the  wavelet  transform  maxima.  The 
inflection  points  of  f(x)  are  located  at  x  =  l/(nTr),  for  n  G  T.  Since  the 
wavelet  \J)(x)  has  only  one  vanishing  moment,  all  the  maxima  lines  converge 
toward  the  points  x  =  1/(n7r).  Since  f(x)  is  continuously  differentiable  in 
the  neighborhood  of  l/lnn),  the  wavelet  transform  along  a  maxima  line 
converging  to  I/Iutt)  satisfies 

|Wf(s,x)K  A„s.  (5.11) 

The  derivative  of  ffx)  at  1/(n7i)  is  equal  to  so  one  can  derive 

that  An  =  0(n"^).  It  is  interesting  to  observe  that  along  all  maxima  lines 
in  the  neighborhood  of  0,  the  wavelet  transform  decays  proportionally  to 
the  scale  s  although  f(x)  is  discontinuous  in  0.  This  singularity  in  0  can 
however  be  detected  because  the  constants  An  grow  to  +oo  when  we  get 
closer  to  0.  Figure  5.4f  displays  the  local  maxima  of  the  wavelet  transform 
of  sin  ( 1  /  X ).  In  the  neighborhood  of  0,  at  fine  scales,  the  maxima  line  have  a 
different  geometry  in  the  scale  space  (s,x)  due  to  the  aliasing  when  sampling 
sin(l/xi,  for  numerical  computations.  Let  us  now  introduce  the  general 
maxima  points  and  explain  how  they  are  related  to  the  size  of  the  oscillations 
of  t'(x). 


Figure  5.4d:  Ciraph  of  sin  1 1  'x ). 

Definition  5.5.  We  call  genera/  nui\innim  of  VVf(s,  x)  a  point  x^  i  where 
iWf(s,x)i  has  a  strict  local  maximum  within  a  two-dimensional  neighbor¬ 
hood  in  the  scale-space  plane  (,s,x  j. 

Clearly,  a  general  maxima  point  belongs  to  a  local  maxima  line  as  de¬ 
fined  by  Definition  5.1.  General  maxima  are  points  where  VVf(.>;,x)  reaches 
a  local  maximum  when  the  variables  (.-..x!  vary  along  a  maxima  line  Fqu.i- 
tion  (5.101  proves  that  the  maxima  lines  of  the  wavelet  transform  of  smlu'.'X ) 
are  vertical  lines  in  the  .scale-space  plane  (,s,  x)  given  by  x  m,  for  •  L.  If 
nldiell  has  one  global  maxima,  for  te  .•  0,  at  u',„  and  no  other  local  maxima, 
then  (5  10!  implies  that  there  is  only  one  general  maximum  along  each  max¬ 
ima  line  and  it  appears  at  the  scale  .se  le,,,  'tee  A  wavelet  equal  to  the  n"' 
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Figure  5.4e:  Wavelet  transform  of  sin(l/x).  The  amplitude  is  maxi¬ 
mum  along  a  parabola  in  the  scale-space  that  converges  to  (0, 0)  in 
the  scale-space. 


-4  X 


Figure  5.4f:  Local  maxima  of  the  wavelet  transform. 


derivative  of  a  Gaussian  has  such  a  property.  If  lil^lcu)!  has  several  local  max¬ 
ima,  for  a>  >  0,  there  arc  several  general  maxima  along  each  maxima  line  but 
the  one  where  |Wf(s,  x)f  has  the  highest  value  is  at  the  scale  so  -  a>„,/uv- 
One  can  thus  recover  the  frequency  luo  frorr  the  location  of  this  general  max¬ 
ima.  Figure  .‘>.4g  displays  the  sub-part  of  each  maxima  line  that  is  below  the 
general  maxima  of  maximum  amplitude.  In  the  scale-space,  these  general 
maxima  belong  to  a  parabola  whose  equation  is  approximatively  given  by 
s  a’,n/a’x  =  Ax'^.  This  equation  is  only  an  approximation  because  the 
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Figure  5.4g:  The  maxima  line  are  displayed  from  the  scale  where  is 
located  the  largest  general  maxima.  The  extremity  of  each  maxima 
line  indicates  the  position  of  a  general  maxima  point  and  it  belongs 
to  a  parabola  in  the  scale-sp<ice  (s,x). 

frequency  to*  varies  locally.  A  finer  analysis  of  this  type  of  property  can  be 
found  in  the  work  of  Delprat  et  al.  [8].  If  f(x)  is  locally  equal  to  the  sum  of 
several  sinusoidal  waves  whose  frequency  are  well  apart,  so  that  they  can  be 
discriminated  by  \j^(scu)  when  s  varies  (see  (5.10)),  then  we  can  measure  the 
frequency  of  each  of  these  sinusoidal  waves  from  the  scales  of  the  general 
maxima  that  they  produce.  The  efficiency  of  this  method  depends  on  how 
concentrated  is  the  support  of  ii>(tu).  Here,  we  are  limited  by  the  uncertainty 
principle,  which  requires  that  rflx)  cannot  have  its  energy  well  concentrated 
both  in  the  spatial  and  frequency  domains.  To  distinguish  spectral  lines  that 
are  too  close,  it  is  necessary  to  use  more  sophisticated  methods  as  described 
by  Delprat  et  al.  [8]. 

Let  us  now  give  a  spatial  domain  interpretation  of  this  frequency  mea¬ 
surement.  We  show  that  if  the  wavelet  ij^lx)  has  only  one  vanishing  moment, 
the  general  maxima  points  provide  measurements  of  the  local  oscillations 
even  if  the  function  is  not  locally  similar  to  a  sinusoidal  wave.  If  is  the 
derivative  of  a  smoothing  function  d(x),  (4.41  proves  that 

Wf(s.x)  =  s,-{^{f  ♦tlj(x). 
hence 


If  locally  f(x)  has  a  simple  oscillation  like  in  Figure  5.5,  has  a  constant 
sign  between  the  two  top  points  xi  and  xi  of  the  oscillation.  The  point 
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( So ,  xo )  is  a  general  maximum  if  the  support  of  0  ( ( xo  —  x)/so )  covers  as  much 
as  possible  the  positive  part  of  without  paying  the  cost  of  covering  a 
domain  where  is  too  negative.  This  means  that  the  distance  between 
the  two  top  points  of  the  oscillation  is  of  the  order  of  the  size  of  the  support 
of  0(x)  multiplied  by  the  scale  so: 


X2  —  xi  w  Kso-  (5.13) 

This  spatial  domain  interpretation  shows  that  even  if  the  function  is  not  lo¬ 
cally  similar  to  a  sinusoidal  wave,  the  size  of  the  oscillation  is  approximately 
proportional  to  the  scale  so  of  the  general  maxima  point.  If  the  wavelet  vi)  (x ) 
has  more  than  one  vanishing  moment,  this  spatial  interpretation  is  not  valid. 


fix) 


Figure  5,5:  We  suppose  that  the  wavelet  is  the  first  derivative  of  a 
smoothing  function  0(x).  The  point  (so.xo)  is  a  general  maxima  of 
the  wavelet  transform  of  f(x)  if  the  function  0sj(x  —  xo)  covers  a 
domain  as  large  as  pos.sible  where  the  function  f(x)  has  a  positive 
derivative. 
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With  (5.9),  we  saw  that  if  a  function  f(x)  has  fast  oscillations  in  the 
neighborhood  of  xo,  then  the  regularity  at  xo  depends  upon  the  behavior  of 
Wf(s,x)  below  the  cone  of  influence  of  xo-  To  estimate  this  behavior,  one 
approach  is  to  measure  the  decay  of  |Wf(s,x)]  at  the  general  maxima  points 
that  are  below  the  cone  of  influence  of  xo,  when  we  converge  towards  xo- 
Indeed,  these  general  maxima  points  characterize  the  size  of  the  oscillations 
of  f(x)  and  they  give  an  upper  bound  for  the  value  of  the  wavelet  transform 
along  each  maxima  line.  Theorem  3.4  proves  that  f(x)  is  Lipschitz  a  at  xo 
only  if  |Wf(s,x)l  =  Of|x  -  xoD  below  the  cone  of  influence.  Hence,  f(x)  can 
be  Lipschitz  a  at  a  point  xo  only  if  the  general  maxima  point  (si,  Xi )  below 
the  cone  of  influence  of  xo  satisfies 

|Wf(si,Xi)l  =  0(|Xi  -xon.  (5.14) 

This  necessary  condition  gives  an  upper  bound  on  the  Lipschitz  exponents 
at  Xo.  For  f(x)  =  sin(l/x),  (5.14)  is  satisfied  only  for  a  =  0.  We  thus  detect 
the  discontinuity  at  x  =  0  from  the  values  of  the  general  maxima  points. 
In  most  situations,  the  general  maxima  points  must  be  used  in  conjunction 
with  the  local  maxima  lines  in  order  to  estimate  the  decay  of  iVV^fls,  x)'  inside 
and  below  the  cone  of  influence  of  xo. 

6.  Completeness  of  the  wavelet  maxima 

We  proved  that  the  singularities  of  a  function  can  be  detected  from  the 
wavelet  transform  local  maxima.  One  might  wonder  whether  the  positions 
and  the  values  of  the  wavelet  transform  maxima  provide  a  complete  and 
stable  representation  of  f(x).  The  reconstruction  of  a  function  from  the  local 
maxima  of  its  wavelet  transform  has  been  studied  numerically  by  Zhong 
and  one  of  us  [16].  Local  maxima  are  detected  only  along  a  dyadic  se¬ 
quence  of  scales  (2')j.  c.  to  obtain  efficient  numerical  implementations.  The 
reconstruction  algorithm  recovers  signals  with  a  relative  precision  approx- 
imatively  equal  to  I0“‘.  The  remaining  error  is  mostly  concentrated  in 
the  highest  frequencies.  More  recently  Meyer  [22]  proved  that  the  wavelet 
transform  local  maxima  do  not  provide  a  complete  signal  representation. 
He  constructed  different  functions  whose  wavelet  transform  have  the  same 
local  maxima  at  all  scales.  However,  these  functions  mostly  differ  at  high 
frequencies  and  their  relative  l"(tH)  distance  is  of  the  same  order  as  the 
precision  of  the  numerical  reconstruction  algorithm.  This  seems  to  indi¬ 
cate  that  the  wavelet  transform  local  maxima  is  "complete"  modulo  a  small 
high  frequency  error  that  remains  to  be  identified  mathematically.  This  sec¬ 
tion  reviews  briefly  the  properties  of  a  dyadic  wavelet  transform  as  well 
as  the  algorithm  that  approximates  a  functions  from  local  maxima.  Sec¬ 
tion  7  describes  an  application  to  the  suppression  of  w’hite  noise  with  a  local 
estimation  of  Lipschitz  exponents. 
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We  call  dyadic  wavelet  transform  the  sequence  of  functions  of  the 
variable  x 

(Wf(2\x))jei.  (6.1) 

Equation  (2.3)  implies  that  the  Fourier  transform  of  Wf(2\  x)  is  given  by 

Wf(2’,tu)  =rj)(2’a))f(co).  (6.2) 

The  function  f(xi  can  be  reconstructed  from  its  wavelet  transform  and  the 
reconstruction  is  stable  [7, 16]  if  and  only  if  there  exists  two  constants  A  >  0 
and  B  >  0  such  that 

+  OO 

Y.  ^  B.  (6.3) 

j  -OO 

Let  us  denote  by  ||Wf(2',  x)||  the  L^liR)  norm  of  the  function  Wf(2',  x)  along 
the  variable  x.  As  a  consequence  of  (6.3),  by  applying  the  Parseval  theorem, 
one  can  prove  that  a  dyadic  wavelet  transform  has  finite  energy 

^  bc 

Ailfl|^$  Y  ||Wf(2‘,x)||^  $  Bl|tll‘.  (6.4) 

i  —  x 

This  means  that  (Wf(2',x))jcc,  belongs  to  the  Hilbert  space  l‘(L--’)  of  se¬ 
quences  of  functions  (giixlljtc,  thatsatisfy 

f  -x 

Y  <  +=>0- 

i  -oc 

Similarly  to  the  continuous  wavelet  transform,  the  dyadic  wavelet  transform 
is  overcomplete.  This  means  that  any  sequence  igilxDjtj,  is  not  a  priori  the 
dyadic  wavelet  transform  of  some  function  f  t  L^(fH).  The  space  V  of  all 
dyadic  wavelet  transforms  of  functions  in  L^l'Jl)  is  strictly  included  in  l^(L^). 
An  orthogonal  projection  from  l-^jL-^)  onto  V  is  defined  by  a  reproducing 
kernel  equation  similar  to  (2..5)  [16]. 

If  the  wavelet  satisfies  the  condition  (6.3),  the  LipschiL?  regularity  of 
a  function  is  also  characterized  by  the  decay  across  scales  of  the  wavelet 
transform  at  the  scales  (2')jcc..  Theorems  3.3  and  3.4  remain  valid  if  we 
restrict  the  scale  to  the  sequence  (2’)jec  (14].  We  can  thus  characterize  the 
regularity  of  a  function  from  the  behavior  of  the  wavelet  transform  local 
maxima  at  the  dyadic  scales.  The  results  and  theorems  of  Section  5  are  valid 
if  we  restrict  the  scale  parameter  s  to  (2’)iec,-  Figure  6.1b  is  the  dyadic 
wavelet  transform  of  the  signal  in  Figure  6.1a,  computed  with  the  wavelet 
shown  in  Figure  5.1a.  The  finer  scale  is  limited  by  the  resolution  of  the 
original  discrete  signal.  We  also  stop  the  decomposition  at  a  finite  largest 
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scale.  In  Figure  6.1b,  the  largest  scale  is  2".  The  informution  provided  bv  the 
dyadic  wavelet  transform  at  scales  larger  than  2“  is  given  by  one  function 
[16],  shown  at  the  bottom.  It  carries  the  lower  frec^uencies  of  f ( x ),  Figure  6. 1  c 
displays  the  local  maxima  of  the  wavelet  transform.  Each  Diiac  indicates 
the  position  and  value  erf  Wf(2',  x)  at  a  maxima  location. 


W/(2^.x) 

WfO?,x) 

Wf{2\x) 

Wfil^x) 


- « 

1 

1 

V  ■  ■ 

V 

- 

Figure  6.1b;  Wavelet  transform  computed  up  to  the  scale  2". 

Figure  6.1b  gives  the  remaining  low-frequencies  at  scales  larger  than  2". 

Since  the  wavelet  is  the  first  derivative  of  a  smoothing  function,  the 
wavelet  transform  maxima  are  located  where  the  signal  has  sharp  transitions. 
They  provide  an  adaptive  description  of  the  signal  information.  The  more 
irregularities  in  the  signal,  the  more  wavelet  maxima.  Let  us  now  study 
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Figure  6.1c:  Local  maxima  of  the  wavelet  transform.  At  each  scale, 
each  Dirac  indicates  the  position  and  value  of  a  wavelet  transform 
local  maximum.  We  also  keep  the  remaining  low-frequency  infor¬ 
mation  shown  at  the  bottom. 


Figure  6.1d:  Signal  reconstructed  from  the  wavelet  transform  local 
maxima  shown  in  Figure  6.1c. 


the  completeness  of  this  local  maxima  representation  and  briefly  explain  the 
reconstruction  algorithm  introduced  by  /.hong  and  one  of  us  [16].  We  want 
to  characterize  the  set  S  of  all  possible  wavelet  transforms  that  have  exactly 
the  same  local  maxima  as  the  wavelet  transform  of  f(x).  The  representation 
is  complete  if  and  only  if  the  set  S  is  reduced  to  the  wavelet  transform  of 
f(x).  Clearly  S  is  included  in  the  space  V  of  all  dyadic  wavelet  transforms. 
The  set  S  is  also  included  in  the  set  Fof  all  sequences  of  functions  (ciilxlhe  i 
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in  l^(L^)  such  that  for  each  integer  j,  the  local  maxima  of  gj(x)  occur  at  the 
same  locations  and  have  the  same  values  as  the  local  maxima  of  Wf(2',  x). 
For  each  j  e  2^,  we  require  that  g  j  (x)  belongs  to  the  space  H '  (SH )  of  functions 
one-time  differentiable  in  the  sense  of  Sobolev  so  that  their  local  maxima  are 
well  defined.  This  other  constraint  is  justified  if  the  wavelet  i|;(x)  €  H'  (SH) 
since  it  implies  that  Wf  (2’,x)  e  H'  (iH).  It  is  easy  to  verify  that 

rn  V  =  s. 

If  the  representation  is  not  complete,  then  the  set  S  is  not  reduced  to  the 
wavelet  transform  of  f(x).  One  can  still  recover  a  good  approximation 
of  this  wavelet  transform  if  the  size  of  S  is  "small".  The  reconstruction 
algorithm  is  based  on  alternative  projections  on  the  set  F  and  the  Hilbert 
space  V.  We  begin  with  an  initial  sequence  of  functions  (gj  (x]  arbitrarily 
chosen  and  then  project  successively  this  initial  sequence  on  V  and  F,  as 
illustrated  by  Figure  6.2.  If  the  discrete  signal  has  a  total  of  N  samples,  the 
computational  complexity  of  the  projections  on  V  and  F  is  0(N  log  Nl )  [16]. 
The  convergence  of  the  alternative  projection  algorithm  to  the  intersection 
of  F  and  V  is  not  proved.  However,  in  all  our  numerical  experiments,  the 
algorithm  does  converge  fast.  The  root  mean-square  error  to  signal  ratio  of 
the  reconstructed  signal  is  of  the  order  of  5  >  10“^  after  20  iterations  on  the 
projection  operators  [16].  Figure  6.1d  is  an  example  of  signal  reconstructed 
with  20  iterations.  The  differences  with  the  original  function  are  not  visible 
on  the  graph.  ,ve  increase  the  number  of  iterations,  the  reconstruction 
error  decrease  jut  reaches  a  limit  which  is  of  the  order  of  10“^.  This 
limitation  of  precision  is  due  to  the  non-completeness  of  the  local  maxima 
representation.  Meyer  proved  recently  [22]  that  for  some  particular  functions 
f(x),  one  can  find  high  frequency  perturbations  e(x)  such  that  Wf(2’,x)  and 
W(f  -f  t')(2’,  x)  have  the  same  local  maxima  at  all  .scales  2'.  This  means  that 
the  solution  set  S  is  not  reduced  to  the  wavelet  transform  of  t(x).  However, 
the  numerical  experiments  as  well  as  the  mathematical  counter-examples 
seem  to  indicate  that  S  is  small.  A  precise  mathematical  characterization 
of  the  set  S  remains  to  be  done.  Once  we  recover  a  wavelet  transform  that 
belongs  to  S,  we  reconstruct  the  corresponding  signal  by  applying  the  inverse 
wavelet  transform  operator.  From  a  practical  point  of  view,  the  numerical 
precision  of  this  reconstruction  algorithm  is  sufficient  for  a  large  class  of 
signal  processing  applications.  The  next  section  describes  an  application 
to  denoising. 

7.  Signal  denoising  based  on  wavelet  maxima  in  one  dimension 

The  properties  of  a  signal  can  be  modified  by  processing  its  wavelet  trans¬ 
form  maxima  and  then  reconstructing  the  corresponding  function.  We  de¬ 
scribe  an  application  to  denoising  based  on  a  local  estimation  of  the  signal 
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Figure  6.2:  The  reconstruction  of  the  wavelet  transform  of  f(x)  is 
done  with  alternative  projections  on  the  set  f  that  expresses  the 
constraints  on  the  local  maxima  and  on  the  space  V  of  all  dyadic 
wavelet  transforms.  The  wavelet  transform  of  f  (x)  is  included  in  the 
intersection  f  and  V. 


singularities.  The  most  classical  technique  to  remove  white  noise  from  a 
signal  i.s  to  convolve  the  signal  with  a  Gaussian  filter.  For  a  large  class  of 
important  signals,  the  energy  of  the  white  noise  dominates  the  signal  at  high 
frequencies  whereas  the  energy  of  the  signal  dominates  the  noise  at  low  fre¬ 
quencies.  The  Gaussian  low-pass  filtering  attenuates  the  high  frequencies 
and  keeps  the  low  frequencies.  As  a  consequence,  a  large  portion  of  the  noise 
is  removed  but  the  sharp  variations  of  the  original  signal  are  smoothed.  The 
fact  that  most  of  the  signal  energy  is  concentrated  in  low-frequencies  often 
indicates  that  most  of  the  singularities  have  Lipschilz  exponent  that  are  pos¬ 
itive.  Our  denoising  algorithm  discriminates  the  signal  and  the  noise  with  a 
local  analysis  of  the  singularity  types. 

Let  us  first  describe  the  properties  of  the  wavelet  transform  of  while 
noise.  Let  n(x)  be  a  real  white  noise  random  process  and  VVn(s,x)  be  its 
wavelet  transform.  We  denote  by  E(X)  the  expected  value  of  a  random 
variable  X.  We  suppose  that  the  wavelet  rl>(x)  is  real.  Crossmann  et  al.  [10] 
have  shown  that  the  decay  of  F  (IWnl.s,  x)!'^)  is  proportional  to  1/s.  Indeed, 


|Wn(s,x)P  =- 


n(u)n(v)i|>s(x 


u)rl.>^(x  -  v)  du  dv. 


Since  n(x)  is  a  white  noise,  L(n(u)n(v))  =  6(u  -  v),  hence 


E(|Wn(s.x)h 


1  CC 


t  X 

6(u  v)il)s(x  “  ulijislx  -  vjdu  d\'. 


—  X 
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We  thus  derive  that 

E(|Wn(s,x)|^)  =  (7.1) 

s 

At  a  given  scale  s,  the  wavelet  transform  Wn(s,x)  is  a  random  pro¬ 
cess  in  X.  If  we  suppose  that  the  white  noise  n(x)  is  Gaussian  white  noise 
then  Wn(s,x)  is  also  a  Gaussian  process.  From  this  property,  we  prove  in 
Appendix  D  that  at  a  scale  s,  the  density  of  local  maxima  of  the  wavelet 
transform  is 


Ttlivl;'  "II  s  ‘ 


(7.2) 


where  if)  "  (x|  is  the  n'*'  derivative  of  4)(x)  and  A  a  constant  between  0.3 
and  1.  The  density  of  local  maxima  is  inversely  proportional  to  the  scale  s. 
The  realization  of  white  noise  is  a  distribution  which  is  almost  everywhere 
singular.  One  can  prove  that  the  singularities  of  Gaussian  white  noise  are 
l.ipschitz  1/2.  Figure  7.1a  is  a  signal  obtained  by  adding  Gaussian  white 
noise  of  variance  1  to  the  signal  of  Figure  6.1a.  Figure?,  lb  shows  its  dyadic 
wavelet  transform. 


Figure  7.1a:  Signal  of  Figureb.la  to  which  we  added  Gaussian  white 
noise  of  variance  I. 

Let  us  suppose  that  the  original  signal  has  isolated  singularities  whose 
Lip.schitz  regularities  are  positive.  Since  the  noise  creates  singularities  whose 
l.ipschitz  regularity  is  negative,  we  can  discriminate  the  local  maxima  cre¬ 
ated  by  the  white  noise  from  the  ones  produced  by  the  signal,  by  looking  at 
the  evolution  of  their  amplitude  across  scales.  If  the  local  maxima  have  an 
amplitude  which  increases  when  the  scale  decreases,  it  indicates  that  the  cor¬ 
responding  singularities  have  negative  Lipschitz  exponents.  These  maxima 
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Figure  7.1b:  Wavelet  transform  computed  up  to  the  scale  2'^. 
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Figure  7.1c:  Local  maxima  of  the  wavelet  transform.  At  coarser 
scales  the  maxima  of  the  signal  discontinuities  dominate  the  maxima 
of  the  white  noise. 


are  mostly  dominated  by  the  white  noise  and  thus  are  removed.  At  the  lo¬ 
cations  where  the  signal  has  singularities  with  positive  Lipschitz  exponents, 
the  noises  adds  singularities  with  negaLve  Lipschitz  exponents.  Mathe¬ 
matically,  the  sum  is  a  signal  whose  singularities  have  negative  I,.\  .-jchitz 
exponents.  However,  if  the  signal  dominates  the  noise  at  low  frequencies, 
wherever  the  signal  is  singular,  at  coarse  scales  the  amplitude  of  the  local 
maxima  is  mostly  influenced  by  the  signal  variations.  Since  the  signal  sin- 
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gularities  have  positive  Lipschitz  exponents,  at  coarse  scales  the  amplitude 
of  the  corresponding  maxima  do  not  increase  when  the  scale  decreases.  This 
can  be  observed  in  the  neighborhood  of  the  discontinuities  of  the  noisy  signal 
shown  in  Figures  7.1a  through  7.1c. 

In  order  to  evaluate  the  behavior  of  the  wavelet  maxima  across  scales, 
we  need  to  make  a  correspondence  between  the  maxima  that  appear  at 
different  scales  2'.  We  say  that  a  maxima  at  a  scale  2’  propagates  to  another 
maxima  at  the  coarser  scale  2’  *  ’  if  both  maxima  belong  to  the  same  maxima 
line  in  the  scale  space  (s,  x).  Equation  (7.2)  proves  that  for  a  white  noise, 
on  average,  the  number  of  maxima  decreases  by  a  factor  2  when  the  scale 
increases  by  2.  Half  of  the  maxima  do  not  propagate  from  the  scale  2' 
to  the  scale  2''  '.  In  order  to  find  which  maxima  propagate  to  the  next 
scale,  one  should  compute  the  wavelet  transform  on  a  dense  sequence  of 
scales.  However,  with  a  simple  ad-hoc  algorithm  one  can  still  trv  to  find 
which  maxima  propagate  to  the  next  scale,  by  looking  at  their  value  and 
position  with  respect  to  other  maxima  at  the  next  scale.  The  propagation 
algorithm  supposes  that  the  m.axima  that  propagate  from  a  scale  2'  to  a 
coarser  scale  2' ' '  are  the  ones  which  locally  have  the  largest  amplitude 
and  which  have  a  location  which  is  close  to  a  maxima  at  the  scale  2',  whose 
amplitude  has  the  same  sign.  Such  an  ad-hoc  algorithm  is  not  exact  but  sax  es 
computations  since  we  do  not  need  to  compute  the  wavelet  transform  at  any 
other  scale.  The  denoising  algorithm  removes  all  maxima  whose  amplitude 
increase  on  average  when  the  scale  decreases,  or  which  do  not  propagate 
at  larger  scales.  These  are  the  local  maxima  that  are  mostly  intluenced  by 
the  noise  fluctuations.  Figure  7.aa  shows  the  local  maxima  that  are  kept  bv 
the  denoising  algorithm.  As  expected,  these  local  maxima  correspond  to  the 
signal  discontinuities.  The  position  and  amplitude  of  the  remaining  local 
maxima  is  affected  by  the  white  noise  components  in  the  corresponding 
neighborhood.  The  white  noise  introduces  more  distortions  at  fine  scales 
because  the  signal  to  noi.se  ratio  is  smaller.  The  maxima  selection  algorithm 
is  based  on  an  analysis  of  singularity  types  and  thu>  cannot  be  used  to 
discriminate  the  low-frequency  sinu.soidal  components  of  the  signal  from 
the  white  noise.  Hence,  we  do  not  try  to  select  local  maxima  below  the  scale 
2’',  and  keep  both  the  signal  and  the  noise  components  below  this  scale.  This 
non-linear  filtering  algorithm,  like  a  Ciaussian  smoothing,  does  not  modify 
the  lowest  frequencies  but  it  removes  .selectively  the  fine  scales  components 
depending  upon  the  local  singularity  types. 

After  the  maxima  selection,  we  reconstruct  a  "denoised"  signal  with 
the  alternative  projection  algorithm  previously  described.  A  priori,  there  is 
no  guarantee  that  there  exists  a  function  who.se  wavelet  transform  has  local 
maxima  that  correspond  exactly  to  the  maxima  that  we  selected.  This  means 
that  the  set  F  that  characterizes  the  maxima  constraints  might  not  intersect 
the  space  V  of  all  wavelet  transforms  (.see  Figure  7.2).  The  reconstruction 
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algorithm  thus  does  not  converge  but  if  vve  stop  after  enough  iterations  (20 
in  practice),  we  reconstruct  a  sequence  of  functions  which  is  close  to  P  and 
V.  The  function  shown  in  Figure  7.3b  was  obtained  after  20  such  iterations. 
As  can  be  observed,  the  two  discontinuities  of  the  original  function  are  still 
perfectly  sharp.  The  overshoot  is  due  to  the  white  noise  components  that 
modified  the  values  and  positions  of  the  original  local  maxima,  at  these  loca¬ 
tions.  In  the  smooth  signal  variations,  we  can  see  the  remaining  components 
of  the  white  noise  that  have  been  kept  at  scales  larger  than  2“*.  This  simple 
algorithm  shows  the  feasibility  to  discriminate  a  signal  from  its  noise  with 
an  analysis  of  the  local  maxima  behavior  across  scales.  Better  strategies  for 
selecting  the  maxima  can  certainly  be  developed  depending  upon  the  appli¬ 
cations.  This  denoising  procedure  does  not  require  that  the  noise  is  white  but 
only  that  its  singularities  have  Lipschitz  exponents  that  can  be  differentiated 
from  the  signal  singularities. 


V 


Figure  7.2:  After  a  modification  of  the  local  maxima,  in  gener.J  there 
is  no  wavelet  transform  who.se  local  maxima  are  exactly  equal  to  the 
one  that  we  selected.  Hence,  the  set  F  that  carries  the  constraints  on 
local  maxima  does  not  intersect  the  space  V  of  all  dyadic  wavelet 
transforms.  The  algorithm  reconstructs  a  sequence  of  functions  that 
is  close  to  r  and  V. 


8.  Conclusion 

We  proved  that  the  wavelet  transform  local  maxima  detect  all  the  singular 
ities  of  a  function  and  we  described  strategies  to  measure  their  Lipschilz 
legularity.  This  mathematical  study  provides  algorithms  for  characterizing 
singularities  of  irregular  signals  such  as  the  multifractal  structures  observed 
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Figure  7.3a:  Local  maxima  kept  by  the  denoising  algorithm. 


Figure  7.3b:  Signal  reconstructed  from  the  local  maxima  shown  in 
Figure  7.3a.  The  overshoot  at  the  discontinuity  locations  is  due  to 
the  modification  of  the  maxima  amplitude  by  the  white  noise. 


in  physics  [23].  Oscillations  can  also  be  measured  from  the  general  maxima 
of  the  wavelet  transform,  with  a  technique  similar  to  the  approach  of  Escudie 
and  Torresani  [9]. 

From  a  numerical  point  of  view,  it  is  possible  to  reconstruct  a  close 
approximation  of  a  signal  from  the  local  maxima  of  its  wavelet  transform. 
We  studied  an  application  to  signal  denoising.  The  prior  information  on  the 
regularity  of  a  signal  versus  the  local  properties  of  the  noise  are  expressed 
through  constraints  on  the  behavior  of  the  wavelet  transform  local  maxima. 


Characterization  of  singularities  } 


{  89 


The  local  maxima  model  has  been  extended  to  two  dimensions  in  order 
to  detect  edges  in  images  [16].  As  in  one  dimension,  images  can  be  recon¬ 
structed  from  the  wavelet  transform  local  maxima.  This  representation  of 
images  with  multiscale  edges  has  applications  in  pattern  recognition  as  well 
as  compact  image  coding.  An  algorithm  that  selects  the  important  edges  for 
building  a  compact  image  code  is  described  in  [16]. 

9.  Bibliography 

[1]  F.  Argoul,  A.  Arneodo,  ].  Elezgaray,  and  C.  Grasseau.  Wavelet  analysis 
of  fractal  growth  process.  In  Proc.  of  4lh  EPS  Liquid  State  Confer., 
Arcachon,  France,  May  W88. 

[2]  A.  Arneodo,  G.  Grasseau,  and  H.  Holschneider.  On  the  wavelet  trans¬ 
form  of  multifractals  In  Combes  and  et  al.,  editors.  Wavelets.  Springer- 
Verlag,  Berlin,  Heidelberg,  New  York,  London,  Paris,  Tokyo,  Hong 
Kong,  1988. 

[3]  E.  Bacry.  Transformation  en  ondelettes  et  turbulence  pleinement  devel- 
oppee.  In  Rapport  de  Magi.stere,  Univ.  Paris  VII,  1989. 

[4]  E.  Bacry,  A.  Arneodo,  U.  Frisch,  Y.  Gagne,  and  E.  Hopfinger.  Wavelet 
analysis  of  fully  developed  turbulence  data  and  measurement  of  scaling 
exponents.  In  M.  Lesieur  and  O.  Metals,  editors,  Turbulenceand  Coher¬ 
ent  Structure.  Kluwer  Academic  Publishers  Group,  Dordrecht,  Boston, 
London,  1990.  To  appear. 

[5]  J.  Bony.  Propagation  et  interaction  des  singularites  pour  les  solutions 
des  equations  au  derivees  paritelles  non-lineaires.  In  Proc.  of  the  In- 


{  Mallat,  Humng 


m)  } 

ternational  Congress  of  Mathematicians,  pages  1133-1147,  Warszawa, 
1983. 

[6]  J.  Canny.  A  computational  approach  to  edge  detection.  IEEE  Trans,  on 
Pattern  Anal,  and  Machine  Intel.,  8:679-698,  1986. 

[7]  I.  Daubechies.  The  wavelet  transform,  time-frequency  localization  and 
signal  analysis.  IEEE  Trans.  Information  Theory,  36:961-100.3,  Septem¬ 
ber  1990. 

[8]  N.  Delprat,  B.  Escudie,  P.  Guillemain,  R.  Kronland-Martinet,  Ph. 
Tchamitchian,  and  B.  Torresani.  Asymptotic  wa\  elet  and  Gabor  anal¬ 
ysis:  extraction  ot  instantaneous  frequencies.  Technical  Report  CPT- 
91  /P.2512,  CPT,  CNRS  Luminy,  Marseilles,  February  1991 . 

[9]  B.  Escudie  and  B.  Torresani.  Wavelet  representation  and  lime-scaled 
matched  receiver  for  asymptotic  signals.  In  Proc.  of  5th  EUSIPCOConf, 
pages  305-308,  Barcelona,  Spain,  1990. 

[10]  A.  Grossmann.  Wavelet  transform  and  edge  detection.  In  P.  Blanchard, 

L.  Streit,  and  M.  Hazewinkel,  editors,  Stochastic  Processes  in  Physics 
and  Engineering.  D.  Reidel,  Dordrecht,  Boston,  Lancaster,  Tokyo,  1986. 

[11]  A.  Grossmann  and  .1.  Morlet.  Decomposition  of  Hardy  functions  into 
square  integrable  wavelets  of  constant  shape.  SIAM  ].  Math.,  13:72.3- 
736,1984. 

[12]  M.  Holschneider,  R  Kronland-Martinet,].  Morlet, and  P.  Tchai'nilchian. 

A  real-time  algorithm  for  signal  analysis  with  the  help  of  the  wavelet 
transform,  1988.  Preprint. 

[13]  M.  Holschneider  and  P.  Tchamitchian.  Regularite  locale  de  la  fonction 
non-differentiable  de  Riemann.  In  PG.  Lemarie,  editor,  I.es  ondelettes 
en  1989,  Lecture  notes  in  Mathematics.  Springer- Verlag,  Berlin,  Heidel¬ 
berg,  New  York,  London,  Paris,  Tokyo,  Hong  Kong,  1989 

[14]  S.  jaffard.  Exposants  de  Holder  en  des  points  donnes  et  coefficients 
d'ondelettes.  Notes  au  Compte-Rendu  de  T Academic  Des  Sciences, 
308,  serie  1:79-81,  1989. 

[15]  S.  Jaffard.  Pointwise  smoothness,  two  microlocalisation  and  wavelet 
coefficients.  Publicacions  Matematiques,  35, 1991. 

[16]  S.  Mallat  and  S.  Zhong.  Characterization  of  signals  from  multiscale 
edges.  Computer  science  lech,  report,  NYU,  December  1991. 

[17]  B.  Mandelbrot.  The  Fractal  Geometry  of  Nature.  W.H.  Freeman,  San 
Francisco,  New  York,  1983. 


[18]  D.  Vlarr  Visit'.'!  Wii  ••■  ■■  ■  .  ,  jsj 

[19]  D.  Marr  ami  I  ii-iti’--  -  /vm  “stt 

London,  207  1 S7  2 1  “  ’ 

[20]  Y.  Meyt'r  Clfidfit'Ki- t  r  I 'is  •  ■ 

[21]  Y.  Mt'vi'r  La  2-niKrtiliK.iii--,)t!‘'-  ■ 

[22]  Y.  Meyer  L'n  etmtrf-extnipi.  t  ..t  d.  \1  irr  t  i  a  eelle  de 

S.  Mailat.  Preprint,  1*^1 

[23]  j.F,  Muzy,  K.  Bacry,  and  A  Arneodn  Wavelet'-  and  nviiltitraetal  lormal- 
ism  for  singular  signals  applieatu'n  to  turBi.leni  data  Submitted  tt) 
Physics  Review  Letters,  julv  L-t'^l 

[24]  A.  Papoulis.  Probability,  Random  Variables,  and  Stochastic  Processes. 
McGraw-Hill,  1984. 

[25]  A.  Rosenfeld  and  M.  Thurston.  Edge  and  curve  detection  for  visual 
scene  analysis.  lEEH  Trans,  on  Computers,  €-20:562-569, 1971 

[26]  F.  Treves.  Topological  Vector  Spaces,  Distributions  and  Kernels.  Aca¬ 
demic  Press. 

[27]  A.  Witkin.  Scale  space  filtering.  In  Proc.  Int.  Joint  Conf.  Artificial  Intell., 
1983. 

[28]  S.  Zhong.  Edges  representation  from  wavelet  transform  maxima.  PhD 
thesis,  New  York  University,  New  York,  September  1990. 

[29]  A.Zygmund.  Trigonometric  Series.  Cambridge  University  Press,  Cam¬ 
bridge,  U.K.,  1968. 


A.  Proof  of  Theorem  5.2 

We  prove  Theorem  5.2  by  proving  by  induction  the  following  proposition. 

Proposition  A.l  ((P,,))-  Let  be  a  wavelet  that  can  be  written  ij)(x)  - 
'  where  (Xx)  is  a  continuous  function  of  compact  support.  Let  f(xi 
be  a  function  and  we  suppose  that  for  any  e  >  0,  there  exists  a  constant  K, , 
'■'"'h  that  at  all  scales  s 

'b  f  t 

If  ♦  cl),(x)l  dx  <  Ki .  (A.l ) 

J  (1  •  <, 

If  Wf(s,  x|  has  no  maxima  for  x  t  la,  b[  and  s  <  so,  then  for  any  e  >  0,  there 
exists  a  constant  A,  such  that  for  any  x  e  la  -t  e,b  -  e[  and  s  <  so. 


|Wf(s,X)I  j;  A.,n 


(A.2) 
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If  we  modify  f(x)  by  multiplying  it  by  the  indicator  function  of  [q,  bi, 
we  do  not  modify  its  regularity  on  any  interval  [o  +  e,  b  -  e].  We  shall  thus 
suppose  that  f(x)  =  0  for  x  ^  fa,  bj.  Let  us  first  prove  that  (A.l )  is  satisfied. 
Since  f(x)  €  L'  (fa,  b])  and  f(x)  =  0  for  x  ^  [a,  b]. 


if*  (})s(x)|dx  $  |f(x)|dx 


r  ^  oo 
J  — oc 


(x)|  dx. 


With  a  change  of  variable  in  the  integral  we  obtain 


lcl),(x)|  dx  = 


I4)(x)|  dx. 


Hence,  if  *  (bs(x)!  dx  is  bounded  by  a  constant  independent  of  the  scale  s, 
as  in  (A.l ).  In  order  to  prove  the  proposition  (P„ )  for  n  =  L  we  introduce  a 
lem.ma. 

Lemma  A.2.  Let  (c.  d]  be  an  interval  of  tH.  Let  K  be  a  positive  constant.  Let 
g(x)  be  a  function  which  satisfies 


|g(x)|  dx  <  K, 


and  such  that  has  no  local  maxima  on  |c,d!.  Let  >  0  with  p  < 

(d  -  c)/4.  There  exists  two  constants  and  such  that 


Vx 

fc  +  ^,d- 

-  f3i. 

lg(x)|  <-  Bp 

(A.4 

Vx  0 

Ic  +  3.  d 

(31, 

i-w- 

(A. 5 

The  constants  Bp  and  Cp  only  depends  upon  |3  ,  d  c  and  K. 

We  denote  g'(x)  =  Although  quite  simple,  this  proof  is  long 


because  it  includes  many  sub-cases.  We  prove  (A.4)  and  then  (A. 5).  In  the 
following,  we  only  consider  the  values  of  g(x)  over  the  interval  ic,di.  We 
first  have  t^vo  cases.  Since  lg'(x)|  has  no  local  maximum,  either  g'(x)  has  a 
constant  sign  or  g'(x)  is  monotonic. 

1)  If  we  suppose  that  g'(x)  has  a  constant  sign  then  g|x)  is  monotonic. 

Equation  (A. .3)  yields 


ig(xl|dx  ^  K  and 


|g(x)|dx  $  K. 


Since,  g(x)  is  monotonic  on  Ic.dl,  these  integral  constraints  imply  th.it 
lg(c +  (3)1^1  and  lg(d-|3)|$|.  (A. 7) 
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To  prove  (A. 7)  one  must  distinguish  several  cases.  For  example  if 
g'(x)  is  positive  and  g(x)  remains  positive,  the  second  integral  of  (A. 6) 
implies  that  !g(d  -  P)|  ^  K/|3  and  since  |g(c  +  (3)|  $  |g(d  -  |3)|  (A. 7)  is 
valid.  The  other  cases  are  treated  similarly.  Since  g(x)  is  monotonic, 
|g(x)l  ^  Max(|g(c  +  |3)|,|g(d  -  S)l),  hence  (A.4)  is  satisfied  for  Bp  5 
2)  Let  us  suppose  that  g'(x)  is  monotonic,  for  example  that  it  decreases. 
The  function  g(x)  is  concave.  The  same  proof  is  valid  for  a  convex 
function. 


a)  We  first  suppose  that  g(x)  does  not  change  sign  on  Ic  +  (3.  d  -  (M- 

i)  If  g(x)  is  negative,  since  it  is  concave  |g(x)i  ^  Max(|g(c  + 
3)|,|g(d-3)l)/forx  €  )c  +  3,d  -  3[.  Since g'(x) is monoton- 
ically  decreasing,  either  it  is  positive  at  all  points  of  fc,  c  +  Si 
or  it  is  negative  at  all  points  of  fc  +  3.  dl.  We  know  that  g(x  | 
remains  negative  and 

■t  t  p  pd 

ig(x)|  dx  $  K,  |g(x)|  dx  K. 

J  C  J  i  1  P 

We  can  thus  derive  that 

ig(c  +  3)1$  ( I .  )  • 

Since  3  $  (h  -  c)/4,  we  obtain  |g(c  +  3H  ^  K/3-  Similarly 
we  can  prove  that  |g(d  -  3)1  $  K/3-  Hence  lg(x)i  $  K,'3- 

ii)  If  g(x)  remains  positive,  there  exists  e  t  |c  4  3.  d  -  31  such 
that  g(x)  $  g(e)  for  all  x  t  lc  +  3.d-3!.  Since  q(x)  is 
concave,  one  can  derive  that 


rd-P 

■  tip 


g(x) dx 


g(e)(d  -  c  -  23) 
2 


Since  3  <  (d  -  c)/4,  we  obtain  g(c)  $  4K/(d  -  c).  Hence 
|g(x)|  $4K/(d-c). 


b)  Let  us  now  suppose  that  g(x)  changes  of  sign  over  [c  4  3,  d  31- 
Either  both  g(c  +  3)  and  g(d  -  3)  are  negative  or  only  one  of  them 
is  negative.  We  only  consider  the  case  where  both  are  negative. 
The  other  case  can  be  treated  with  the  same  approach.  Since  g(x) 
is  concave,  it  has  two  zero-crossings  at  the  locations  zq  and  zi, 
zo  <  Z).  For  X  €  Ic  +  3,zo[  U  lzi,d  -  31,  g(x)  is  negative  and 
|g(x)|  $  Max(|g(c  +  3)l.lg(d  -  3)1)-  Over  (c,c  +  31  and  [d  -  3,dl 
g(x)  is  monotonic.  With  thesameargumentasin  1),  we  prove  that 


|g(c  +  3)l$K/3  and  |g(d  -  3)U  K/3. 
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c,4  } 


For  X  t  q(x)  >  0  and  there  exists  e  c  Izo.zjf  such  that 

g!x)  $  g(e)  for  x  •-  Izo.zi  (.  We  must  prove  that  g(e)  is  bounded. 
Since  g(x)  is  concave  over  fzo.zil,  one  can  derive  that 


K 


g(x)  dx  5 


g(c)(zi  -  zo) 


IA.8) 


Let  us  suppose  that  g(e)  K/(J.  Let  l(x)  be  the  affine  function 
which  crosses  0  at  the  abscissa  zo,  and  is  equal  to  g(e)  at  the 
abscissa  e.  Before  the  abscissa  zo,  l(x)  is  negative  and  l(x)  >  g(x) 
because  g(x)  is  concave.  Hence,  ll(c  +  (3)|  $  |g(c  +  (3)1  K/(3.  We 

know  that 


|l(c  +  (3)t  zo-c-13 

IUe)i  e-zo  ■ 

Since  !l(c  +  |3)|  $  K/(3  and  1(e)  =  g(e)  ^  K/2,  we  obtain 


e-zo^zo  -c-)3. 

With  the  same  argument  applied  between  on  the  second  zero¬ 
crossing  zi  and  d  -  (3,  we  can  also  prove  that 

z,  -  e  d  -  3  -  zi. 


Adding  these  two  equations  yields 
d-c-2(3^d-c 

z.  j -  ^-4-. 

If  we  insert  this  equation  into  (A.8),  we  obtain 


g(e) 


8K 

d  -  c’ 


Hence,  g(e)  Max(8K/(d  -  c),K/(3).  This  last  case  finishes 
the  proof  of  (A.4)  of  Lemma  A. 2  for  a  constant  Bp  such  that 
Bp  Max(8K/(d  c),k/3). 


Let  us  now  prove  that  g'(x)  is  bounded.  Since  |g'(x)|  has  no  maxima  on 
the  interval  fc  +  3/2,  d  -  3/2),  we  know  that  )g'(x)|  $  max(|g'(c  +  (3)1,  Ig'  (d  - 
3)1)  for  X  [c  +  3,d  -  31-  Let  us  suppose  for  example  that  lg'(c  +  3)!  ^ 
|g'(d  -  3)(.  Then,  |g'(x)|  is  monotonically  decreasing  on  fc  +  3/2,  c  +  (31  ^nd 
g'(x)  does  not  change  sign  over  this  interval.  Hence, 


lg'(c  +  3)1 


2 

3 


■c  (  p 

g'(x)dx 

1-  t  P/2 


2|g(c  +  3/2)-g(c  f  3)1  $  ^Bp 


{ 
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Since  |g'(x)|  $  Max(|g'(c  +  |3)|,|g'(d  —  3)1)  for  x  €  [c  +  3.d  -  3),  we  derive 
that  lg'(x)l  is  bounded  by  a  constant  Cp  which  only  depends  upon  3,  b  —  c 
and  K. 


Lemma  A.3.  Let  fc,  d)  be  an  interval  of  tH.  Let  K  be  a  positive  constant.  Let 
g(x)  be  a  function  which  satisfies 

pd 

lg(x)|dx  <  K, 

J  C 

and  such  that  has  no  local  maxima  on  fc,dl.  Let  3  >  0  with  3  < 

(d  -  c)/4.  There  exists  a  constant  Dp  that  only  depends  upon  3/  d  -  c  and 
K,  such  that 

Vx6  [c  +3.d-31,  ii|j^l<Dp.  (A.9) 


The  proof  of  this  lemma  is  mostly  the  same  as  that  for  Lemma  A. 2  and 
we  leave  it  to  the  reader. 

Let  us  now  prove  that  the  proposition  (P,,)  is  true  for  n  =  1.  Since 
ri)(x)  =  we  derive  that 

\Vf(s,x)  =  s~(f  *  0J(x). 
dx 

Our  induction  hypothesis  supposes  that  g(x)  =  f  •  (l)s(x)  satisfies  (A. 3)  of 
Lemma  A. 2  for  c  =  a  +  e/2  and  d  =  b  --  e/2.  The  result  of  this  lemma  for 
3  =  e/2  and  s  <  so  yields 


Wf(s,x)|  ^  sC\  ,2- 


This  concludes  the  proof  of  (A.2)  for  n  =  1.  The  proof  of  (P„  1  for  n  -  2  is 
based  on  Lemma  A  3  Since  vKx)  =  we  derive  that 


Wf(s,x)  = 


0  J(x). 


We  can  apply  the  result  of  Lemma  A. 3  to  g(x)  ^  f  •  tt>s(x',  with  3  ~  e/2, 
c  =  a  +  e/2  and  d  -  b  -  e/2.  Equation  (A.9)  yields 


iWf(s,x)l  $  s^Di  2. 


which  finishes  to  proof  of  (P„ )  for  n  ^  2. 

Let  us  now  prove  that  if  (P„ )  is  true,  for  n  ^  2,  then  (P„  ,  i )  is  also  true. 
Let  tl)(x)  be  a  wavelet  with  n  +  I  vanishing  moments  and  f(x)  a  function 
that  satisfies  (A. 8).  The  wavelet  v|)(x  I  can  be  written  4)(x)  =  where  the 
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wavelet  x(x)  has  n  vanishing  moments.  Let  be  the  derivative  of  t(x) 
in  the  sense  of  distributions. 


Wf(s,x)  =  s— +  Xsfx).  (A. 10) 

dx 

In  order  to  apply  our  induction  hypothesis  (Pn )  to  with  respect  to  the 
wavelet  x(x),  we  need  provehat  to  for  any  e  >  0,  there  exists  a  constant  K. 
such  that  at  all  scales  s 


■b-t 

O  f  t 


df 

dx 


*  4),s(x)| 
I 


dx  $  Kf 


lA.ll) 


Since  the  wavelet  ijdx)  has  more  than  two  vanishing  moments,  the  propo.si- 
tion  (Pi),  that  we  just  proved,  implies  that  for  any  e  >  0,  if  x  •£  la  +  e,  b  c: 


|Wf(s,xl!  £;  s‘A^  ,2- 


From  Theorem  3.3  we  derive  that  f(x)  is  uniformly  Lipschitz  a  on  the  in¬ 
tervals  la  +  e,b  -  e[,  for  any  a  <  2.  Hence,  is  uniformly  bounded 
on  any  such  interval.  One  can  then  easily  derive  that  the  condition  1  A.l  1 ) 
is  satisfied.  Let  now  apply  the  induction  hypothesis  (P„l  to  with  re¬ 
spect  to  the  wavelet  x(x).  There  exists  a  constant  A,  such  that  for  any 
X  >•:  1q  s.  c,  b  -.  c!  and  s  <  so. 


df 
di  * 


A,,„s". 


Equation  (A. 10)  implies  that 
|Wf(s,x)i  A,,„s'’'  '. 


This  finishes  the  proof  of  (P„  ,  i ). 

By  applying  Theorem  3.3  to  the  statement  (P„),  we  derive  that  the 
function  f(x)  is  Lipschitz  a  for  any  a  n.  For  at  ^  n..  Theorem  3  3  does  not 
apply  because  it  is  an  integer  Lipschitz  exponent. 

Let  us  now  prove  that  (A.2)  implies  that  f(x)is  Lipschitz  n  if  the  wavelet 
4i(x)  can  be  written 


4i(xl 


d''ejx) 

dx 


(A. 12) 


where  0(x)  is  a  smoothing  function.  Let  be  the  ri'*'  derivative  of  f(x) 

in  the  sense  of  distributions.  Similarly  to  (A. 10),  (A. 12)  yields 
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{  *^7 

Equation  (A. 2)  of  the  proposition  (P„ )  implies  that  for  any  e  >  Othere  exists 
a  constant  At  such  that  for  any  x  Iq  +  e,  h  -  e(  and  s  <  so 

d''f  I 

— *  0s(xj  ^  At 
dx"  I 

Since  the  integral  of  0(x)  is  nonzero,  this  equation  implies  that  is  a 

function  which  is  bounded  by  At  ,,  over  the  interval  ]a  ^  e,b  t  .  Hence 
f(x)  is  uniformly  Lipschitz  n  over  the  interval  la  c,  b  -  cb 


B.  Proof  of  Theorem  5.3 

We  first  derive  from  Theorem  5.2  that  f(x)  is  Lipschitz  nat  all  points  different 
than  xo-  Let  xi  e  la.xol-  For  .s  <  so,  IWf(s,x)|  has  maxima  only  in  a 
cone  pointing  to  xo-  Hence,  for  e  >  0  such  that  a  -  c  <  xo  --  c,  there 
exists  .St  such  that  for  s  <  s,,  and  x  c  Iq  -f  c/2.xo  -  c/1’ ,  iWffs.xli  has  no 
maxima.  From  Theorem  5.2  we  derive  that  f(xl  is  uniformly  Lipschitz  tx 
in  !a  +  e, xo  -  cl.  From  this  result  we  easily  derive  that  f|xl  is  uniformly 
Lipschitz  n  in  a  neighborhood  of  any  point  xi  €  la.xel.  The  same  proof  is 
valid  for  xi  e  Ixo,  bf. 

Let  us  now  prove  that  the  Lipschitz  regularity  at  xo  is  characterized  by 
the  decay  of  the  wavelet  transform  local  maxima.  Let  xi  -  la.xoi  and  x^  •: 
Ixo.bf.  We  proved  that  f(x|  is  uniformly  Lipschitz  n  in  the  neighborhood 
of  xi  and  xz-  The  necessary  condition  of  Theorem  3.3  is  valid  for  integer 
Lipschitz  exponents  and  it  implies  that  there  exists  o  such  that  for  .s  so, 

|Wf(s, xi )|  ^  Ai s"  and  |Wf(.s.X2)|  ^  A2S".  (B.1) 

For  X  €  Ix]  ,X2[  and  s  <  so,  the  value  of  |Wf(.s,x)|  is  smaller  or  equal  to  the 
maximum  value  among  |Wf(s,xt)l,  |Wf(s.X2)l  and  the  wavelet  transform 
modulus  at  all  the  local  maxima  that  occur  at  the  same  scale  inside  the  cone 
pointing  to  xo.  Theorem  5.3  supposes  that  all  these  local  maxima  have  an 
amplitude  smaller  than  As“.  Since  a  <  n,  we  derive  from  (B.l)  that  there 
exists  a  constant  B  such  that  if  x  c  IX1.X2I  and  s  <  so, 

|Wf(s,x)|  $  Bs“. 

Since  xo  F  |xi ,  X2(,  Theorem  3.3  implies  that  f(x)  is  Lipschitz  a  at  xo. 
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C.  Proof  of  Theorem  5.4 


}■ 


In  order  to  applv  Theorem  3.4,  vve  want  to  prove  that  there  exist.s  a  scale  .S) 
and  e  •  Osuch  that  if  s  <  St  and  x  c  Ixo  c,xc  ^  e'  , 

'VVf(s,x)!  ;;  Bl.s'’"  +  X  -  xc'^T.  (C.l! 

We  prove  this  hy  showing  separately  that  there  exists  two  constants  Bi  and 
Bj  such  that 

VVf(s,xl  Bi.s' ,  (C.21 

when  (s,  X  I  is  in  the  cone  of  intluence  of  xo  and 
VVf(s,xi  S'  Bj  X  xc 

when  (s.xl  is  below  the  cone  of  influence  of  xc-  Once  (C.l)  is  proved. 
Theorem  5.4  is  a  simple  consequence  of  Theorem  3.4,  for  a  •  y.  I-or  a  y, 
we  cannot  apply  Theorem  3.4  because  we  are  missing  the  logarithmic  term. 
Theorem  5.4  supposes  that  \Vf(,s,  x)  has  a  constant  sign  in  a  neighborhc.od  of 
xo,  and  we  shall  suppose  that  it  is  positive,  f-or  s  so  and  ;X(si  xo  •  C,., 
we  have 


VVf(s,X(sn  y  As' .  iC.4) 

We  first  prove  (C. 2 1  and  then  (C.3)  for  c  -  '(K  C).s.,>andsi  Cis^ 

The  wavelet  ij'(x)  is  the  n"'  derivative  of  a  positive  function  dix!  of 
support  equal  to  '  K,  Ki  and  which  is  strictlv  positive  on  K,  K  ,  Hence. 

\VH(s,x|  -djlxi  -0,  (C.,5) 


where  f  "  (x)  is  the  n'*'  derivative  of  f'(xl  in  the  sense  of  distributimis.  The 
function  d(x)  is  a  positive  function  with  a  strictly  positive  integral  Since 
(C..5)  is  valid  at  all  scales  ,s  se.  it  implies  that  f  ”  (xl  ■  0  for  x  -  ci.b 
(positive  in  the  sense  of  distributions).  Hquation  |C.3i  can  be  rewritten 


Wf(s,xl  s” 


’’  (uidu. 


Let  (s,  x|  be  a  point  in  the  cone  of  intluence  of  xo,  ix  xo:  y  Ks.  The  support 
ofd((x  u)/.s I  is  included  in  ixc  2Ks,xo  2Ksi  so 


Wfls.xl 


(ul  du. 


iC.h) 


Let  M  i  max^,  K.xidlx).  Since  dixl  is  continuous  and  strictly  positive 
over  i  K,  K( ,  there  exists  A  >  0  such  that 


Vx  e  [(-  K  Cl/2.  IK  t  C )/2i. d(x|  -  AM. 
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Lets'  =4Ks/(K  C).  We  know  that  Uo -X(s')|  Cs'.  For  u  L  ixo-2Ks,Xi>  + 
2Ksl,  we  derive  that  |(X(s' )  -  u)/s'|  $  (K  +  C)/2  and  therefore 


vu  t  'Xo  --  2Ks,  xo  +  2Ksl 


5;  AM. 


Since  0  $  0Ux  -  u)/s)  M  and  f'’''(x)  0, 


f''”(ujdu 


Jx,-2Ks  \  S'  / 


Equation  (C.61  yields 
VVt(s,x)  ^  s''“' 


X!s')-  u\ 


f  - "(uldu 


-  dVVf(s',X(s')).  (C.7) 

We  suppose  that  (C.4)  holds  so 

Wt(s',x(s'l)  S;  A(s')'  -  4”^-;,  s\ 

We  thus  derive  from  ( L.7)  that 

Wt'ls.x)  $  Bis''  with  Bi  -  (C,8) 

L.et  us  now  prove  that  if  (s,  x)  is  below  the  cone  of  influence  of  x^,  Wi  is,  x)  < 
Bilx  -  xoA. 


Is.xl-s"-'  '  0  j  f  "  (uidv 


L.et  S2  -  lx  -  Xi'l/K.  Since  (x,  s )  is  below  the  cone  of  influence  of  xo,  x  -  xo'  > 
Ks,sos  L  S2.  The  support  of  0((x  u  I  s)  is  thus  included  in  ;xo  2Ks2,Xc  ' 
2KS2I  so 


(s,x)  -■  s’’  -'  '  0  f  ’‘-■-M  f'"'(u)du. 

Jxo-iKs.  \  S  / 


Let  us  now  define  s'^  =  4Ks2/(K  C).  With  the  same  argument  as  lor  (C,7), 

we  can  prove  that 

Wf(s,x)$  iWf(.s;,,X(s^)).  (C.IO) 

Equation  I  C.4)  implies 

Wf|s'^,X(s'j)  Als',)'' 

By  inserting  (C.1 1 )  in  (C.IO)  we  obtain 

Wt(s,x)  L  Bilx  -  xol''  withB^  xiT^v  IC.12) 

One  can  verify  that  both  (C,8)  and  (C.12)  are  valid  for  x  L  Ixo  -  c,xo  +  el 
and  s  '  S|  with  e  ]fK  C island  si  4’jcll^  C)sa. 
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l(K)  } 


D.  White  noise  wavelet  transform 


It  is  well  known  [24]  that  the  density  of  zero-crossings  of  a  differentiable 
Gaussian  process  whose  autocorrelation  is  R(t)  is 


\  7I^R(0) 


(D.l) 


where  R''''(t]  is  the  n'*'  derivative  of  R(t).  If  the  process  is  twice  differen¬ 
tiable,  the  density  of  local  extrema  is  equal  to  the  density  of  zero-crossings 
of  the  derivative  of  the  process.  The  autocorrelation  of  the  derivative  is 
-  R'^'(t).  Hence,  the  density  of  extrema  is 


iD.2) 


The  autocorrelation  of  the  Gaussian  process  Wn(s,x)  is  defined  by 


R(tI  -  FiVVnfs.x  f  TlVVn(s.x)) 


n(u)n|v)cj)s(x  f  x 


u)il's(x  -  v]  dudv. 


Since  n|x  Ms  white  noise,  t  ln(u)n(v))  ■-  6(u  -  v)  and  vve  obtain 


R(t| 


'  t  V 

iJ’sIt  1  u)vj.\(u)  du. 


ID..?) 


I'rom  this  equatic'ii,  vve  can  prove  that  R''*'(0)  and  R'^'IO)  - 

i  vi’  '  ‘  '■’/s  b  I'rom  (D.2),  we  derive  that  the  density  of  extrema  of  the  process 
Wn|s, x)  is 


At  least  half  of  these  local  extrema  are  local  maxima  of  iWn(s,x)|.  The 
number  of  local  maxima  depends  upon  the  proportion  of  local  extrema  and 
zero-crossings  of  Wn(s,  x).  Equations  (D.l)  and  (D.3)  prove  that  the  density 
of  zero-crossings  of  Wn(s,x)  is  ||il)'"||/(s7i||il)||).  The  proportion  of  local 
extrema  and  zero-crossings  of  Wn(s,x)  is  independent  of  the  scale,  which 
proves  that  the  density  of  local  maxima  of  |Wn(s,  x)|  is 


(D.51 


where  \  is  a  constant  between  0..'>  and  I  that  depends  only  on  ||4)||,  ||vl)'  "|i 
and 
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a  Regular  and  irregular  sampling  theorems  are  proved  using  frames  of  ex¬ 
ponentials,  Gabor  frames,  and  nonharmonic  Fourier  series.  These  include 
the  Shannon  sampling  theorem,  the  Yao-Tliomas  irregular  sampling  theo¬ 
rem,  and  a  result  dual  to  the  Yao-Thomas  theorem.  An  irregular  sampling 
algorithm  is  presented  that  allows  much  more  general  sampling  lattices. 
These  ideas  are  then  applied  to  the  Gabardo-Walker  uniqueness  theorem  to 
obtain  a  corresponding  representation  theorem. 


1.  Classical  sampling  theory  using  frames 

Ordinary  Fourier  series  in  H!-  .  jVl  T  >  0,  that  is  expansions  using  expo¬ 
nentials  of  the  form  ^1,  have  been  used  in  mathematics,  engineering, 

and  science  for  years.  Here  we  give  several  applications  of  nonharmonic 
Fourier  series,  that  is  expansions  using  exponentials  of  the  form  '*'1 

where  the  regular  sequence  of  real  numbers  [riTJ  has  been  replaced  by  the 
irregular  sequence  (t,,}.  The  difficulty  with  applying  nonharmonic  Fourier 
series  is  that  they  are  not  orthonormal  bases  and  so  the  nonharmonic  Fourier 
series  must  be  interpreted  carefully.  The  concept  of  a  frame  provides  this 
interpretation. 

Detinition/Proposition  1.1.  A  sequence  Ign !  5  H,  a  separable  Hilbert  space, 
is  a  frame  if  there  exists  constants  A,  B  >  0  such  that 

Vhc  H,  A||h||^  ^  X^KH.gn)!'  $  B||h||'. 


t  Tho  work  prcxonted  here  is  a  short  exposition  of  joint  work  with  John  Benedetto,  whose 
patience,  friendship,  and  teachings  have  left  a  deep  and  positive  mark  on  me.  I  would  also  like 
to  thank  Hans  Fcichtingcr  and  Christian  Houdr^  for  insights,  discussions,  and  preprints  on  the 
topics  discussed  herein, 
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The  constant  A  (resp.  B)  is  called  the  lower  (resp.  upper)  frame  bound. 
It  [gnl  is  a  frame,  then  we  have  the  reconstruction  formulas 

VheH,  h  =  ^(h,S“'gn)gn 


VhfH,  h  =  ^(h.g„)S-'g„ 

n 

where  the  frame  operator  S  is  given  by 
Sh  =  21(^^>9n)gn. 

n 

If  {g„  1  is  a  frame  in  H,  then  {S“'g„]  is  also  a  frame  in  H  called  the  dual  frame. 
Proof.  See  [4, 9,  3J.  I 

Example  1.2. 

1 )  Orthonormal  bases  in  Hilbert  spaces  are  frames  with  the  reconstruction 
formulas  being  the  orthonormal  basis  decomposition  and  the  frame 
operator  being  the  identity. 

2)  The  exponentials  'Et„  =  where  {tn)  satisfies 


ft„  -nT|$  L<  I 
4 

is  a  frame  for  L^(— 2V.  jy!  Example  1.9).  More  generally,  see  Defi¬ 
nition  2.1  and  Theorems  2.2  and  2.3. 


Remark  1.3.  Duffin  and  Schaeffer  invented  the  concept  of  a  frame  to  deal 
with  questions  about  spanning  properties  of  sets  of  exponentials.  That  is, 
they  were  interested  in  whether  a  collection  of  exponentials  Et„  = 

g-27Tii„Y^  n  €  2.,  generated  by  a  sequence  of  real  or  complex  numbers  'tnj 
was  complete  m  L^f— Q.D],  Q  >  0 — i.e.,  whether  each  function  in  L'^f— Q.Dl 
can  be  approximated  arbitrarily  closely  by  a  linear  combination  of  exponen¬ 
tials  taken  from  the  collection.  Much  work  has  been  done  on  this  and  related 
questions  as  the  interested  reader  may  investigate  by  consulting  [2,  17, 15, 
20J.  For  our  purposes  completeness  is  not  enough;  we  want  to  decompose 
functions  as  sums  of  exponentials  or  other  functions.  The  reconstruction 
formulas  permit  this  and  represent,  until  the  work  on  wavelets  and  related 
topics,  a  neglected  aspect  of  the  work  of  Duffin  and  Schaeffer.  These  formu¬ 
las  are  at  the  heart  of  the  sampling  work  which  follows. 
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Definition/Proposition  1.4.  A  function  f  is  Q-bandlimited,  Q  >  0,  i.e., 
f  €  PWn,  if  f  G  with  supp  f  C  (—0,0],  where  f  is  the  Fourier 

transform  f(y)  =  J  dt.  The  D-bandlimited  functions  are  entire 

functions  of  exponential  type  Q  and  conversely,  i.e.,  there  exists  a  constant 
A  such  that 

V7  e lf(2)i  ^ 


Theorem  1.5  (Shannon).  Let  T,  Q  >  0  for  which  0  <  f)  s;  jy .  Then 
VfGPWn,  f(t)=T^f(nT)d:p(t  nT) 

where  f(nT)  is  the  value  of  f  at  nT  6  iH,  where  d::^  is  the  j  dilation  of  the 
Dirichlet  (or  "sine")  function 

,  sin  t 
“'•1  =  — ■ 

and  where  d^(t  -  nT)  is  the  translation 


d^(t  -  nT)  = 


sin  Y  ( t  -  nT ) 
7T(t  -  nT) 


The  convergence  is  in  and  uniform  over  91. 

Proof.  Consider  the  frame  of  exponentials  in  by 

|g-27nn7yi^^.  We  know  this  collection  is  a  frame  as,  upon  normaliza¬ 
tion,  it  is  an  orthonormal  basis  in  L'^l-yy.  jVl-  frame  operator  is  the 
constant  multiplier  y  as 


Vh€  L^h- 


I,  Sfhl  = 

n 


Hence  both  reconstruction  formulas  reduce  to 

f(Y)  =  (1.1) 

n 

where  is  1  on  f-jy-  ®  elsewhere.  The  characteristic  function 

1,^  I  is  necessary  as  (1.1 )  is  an  expansion  in  yyl,  which  we  view  as 

a  subspace  of  (91 ).  Applying  the  inverse  Fourier  transform  and  evaluating 
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the  coefficients  produces  the  desired  expansion  with  (SH)  convergence.  The 
uniform  convergence  can  be  verified  by  an  advanced  calculus  argument.  | 

Remark  1.6. 

1)  The  coefficients  are  the  function  values  at  the  points  (nT}ngi,  hence 

the  name  "sampling  formula."  Note  that  the  sampling  lattice  [nT] 
generated  the  frame  of  exponentials  in  2T' 

2)  The  decomposing  functions  are  tanslates  of  a  single  2V't>^ndlimited 
function,  dn  (t),  the  dilated  sine  function. 

3)  The  poor  decay  of  the  sine  function  can  be  overcome  by  using  an  over- 

sampling  argument  that  smooths  out  the  discontinuities  of  I,  ^  This 
is  accomplished  by  multiplying  both  sides  of  (1.1)  by  a  function  s  6 
C^(Dl)  with  supp  s  C  f-2T>  2t1  =  i  for  all  y  €  [-0,0]. 

4)  Clearly  one  need  not  invoke  the  concept  of  a  frame  of  exponentials 

in  the  proof  above  as  the  exponentials,  upon  normalization,  form  an 
orthonormal  basis  for  ^1-  The  point  is  that  the  proof  above 

is  generalizable  to  a  class  of  irregular  sampling  lattices,  !t„  ]  instead  of 
(nT),  where  [t,i|  is  a  sequence  of  real  numbers.  By  doing  this,  we  will 
be  able  to  reproduce  the  classical  irregular  sampling  formula  of  Yao 
and  Thomas,  obtain  a  new  dual  result,  and  finally  produce  an  irregular 
sampling  algorithm  for  sampling  lattices  with  great  generality.  To 
accomplish  this  we  will  need  a  few  more  facts  about  frames. 


Definition/Proposition  1.7  ([4]).  A  frame  in  a  separable  Hilbert  space 
is  exact  if  it  ceases  to  be  a  frame  upon  the  removal  of  any  one  element. 
Orthonormal  bases  are  exact  frames,  but  it  can  be  shown  that  the  union  of 
two  orthonormal  bases  is  a  frame  that  is  not  exact.  Several  less  elementary 
examples  are  given  in  Example  1.9. 

Definition/Proposition  1.8  ((41).  Let  Ignl  be  an  exact  frame  in  a  separa¬ 
ble  Hilbert  space  H.  Then  the  frame  {fl,,!  and  the  dual  frame  [S“'gn)  are 
biorthonornial,  i.e., 

(giniS  gn)  — 

where  6,„„  =  1  if  m  =  n  and  zero  otherwise. 

The  point  of  this  definition/proposition  is  that  one  can,  in  certain  cases, 
explicitly  construct  the  sequence  orthonormal  to  [gn}/  that  is  the  dual  frame 
(S"  ’  gn  1,  by  methods  other  than  an  involved  analysis  of  the  inverse  frame  op¬ 
erator  S“'.  For  certain  collections  of  exponentials  this  can  be  accomplished 
by  using  Lagrange  interpolation  theory  and  function  theory.  This  is  done  in 
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the  next  example.  For  convenience  we  let  Et„  =  n  €  Z.,  where  y  is 

in  [-2V  >  2t1  depending  on  the  context. 

Example  1.9. 

1)  (Kadec-Levinson)  If  [tn)  satisfies 
|t„  -  nT|  ^  L  <  ^ 

then  {Ei„l  is  an  exact  frame  for  L^f-yp,  with  dual  frame  (h^  = 
S-'Ei„}  given  by 


Kn  (t)  =  r„(t) 


r(t) 

r'(tnl(t  -  tn) 


where 


n  } 


See[20, 15. 12], 

2)  It  can  be  shown  IlO,  Section  5.3]  that  any  finite  modification  of  an  exact 
frame  of  exponentials  is  also  an  exact  frame — i.e..  replacing  any  finite 
number  of  exponentials  with  exponentials  at  other  points  not  already 
contained  in  the  collection  also  produces  an  exact  frame. 


We  now  apply  these  ideas  to  obtain  the  Yao-Thomas  irregular  sampling 
theorem  [19],  which  is  the  first  expansion  below,  and  a  dual  result. 

Theorem  1.10.  Assume  (tn)  satisfies  the  Kadec-Levinson  condition 
|tn  -  nil  ^  L  <  i. 

Then 

Vf€PWn.  f(t)  =  ^f(t„)r„(t) 

n 

and 

VfePWn,  f(t)  =^(f,r„)d^(t~t„) 


where  (rn)  is  as  defined  above.  Both  series  converge  uniformly  to  f  on  91  as 
well  as  in  L^(91). 
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Proof.  By  Uie  first  assertion  of  Example  1.9  and  the  reconstruction  formulas 
we  have 

VfePWo,  f(y)  =  X.<f'EtJ(^,h„(Y) 

n 

and 

Vf  6  PWn,  f(Y)  = 

n 

Applying  the  inversion  formula  to  both  expansions  and  the  Parseval  relation 
to  the  coefficients  in  the  second,  we  obtain  the  L^(9^)  convergent  sums  of 
the  theorem.  The  uniform  convergence  follows  as  in  [19]  or  by  using  an 
advanced  calculus  argument.  I 

Remark  1.11. 

1)  The  first  expansion  in  the  previous  theorem  is  the  Yao-Thomas  irreg¬ 
ular  sampling  formula  [19].  Yao  and  Thomas  derived  their  sampling 
formula  using  the  Lagrange  interpolation  work  of  Levinson  [15,  Chap¬ 
ter  4]  and  Levin  [14,  p.  198],  providing  an  interpretation  of  it  in  terms  of 
engineering  considerations.  However,  the  second  expansion  cannot  be 
obtained  directly  from  interpolation  considerations  and  hence  appears 
to  be  new. 

2)  Both  of  the  expansions  above  can  be  produced  using  the  idea  of  a  Riesz 
basis  of  exponentials,  as  exact  frames  are  Riesz  bases  and  conversely. 
This  approach  is  described  in  [10]  (see  also  [20]). 

3)  The  Kadec-Levinson  condition  and  the  other  examples  given  above  are 
restrictive  and,  as  such,  we  seek  sampling  formulas  for  a  wider  class 
of  sampling  lattices  [t,,].  So  far  we  have  employed  orthonormal  bases 
and  Riesz  bases  (exact  frames)  of  exponentials.  By  a  basis  we  mean  a 
collection  in  a  Banach  space  by  which  every  element  of  the  space  can  be 
written  uniquely  as  a  (possibly  infinite)  linear  combination  of  elements 
from  the  collection.  One  could  ask  whether  it  is  possible  to  obtain  sam¬ 
pling  formulas  employing  bases  of  exponentials  that  are  not  Riesz  bases 
or  orthonormal  bases.  According  to  Young  [20,  p.  197],  no  example  has 
yet  been  found  of  a  basis  of  exponentials  for  L^l-y^ ,  2VI  is  not  a 
Riesz  basis.  There  are,  however,  examples  of  collections  of  exponentials 
which  are  complete  and  minimal  but  which  are  not  known  to  be  bases 
of  exponentials  [20,  p.  126].  By  minimal  we  mean  a  collection  in  which 
each  element  is  not  contained  in  the  closed  span  of  the  other  elements 
of  the  collection.  A  basis  is  necessarily  minimal  and  complete  but  a 
minimal,  complete  set  need  not  be  a  basis  [10,  Section  4.2].  Sampling 
formulas  for  these  collections  can  be  produced  using  Gram-Schmidt 
orthogonalization.  This  is  discussed  in  [10,  Section  4.2]. 
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4)  The  expansions  for  f  given  in  the  proof  can  be  multiplied  by  func¬ 
tions  s  G  C“(iH)  with  supps  C  and  with  s(y)  =  I  for  all 

Y  e  [-0,0],  as  described  in  Remark  1.6  (3).  Upon  inversion  this 
gives  (s  ♦  r„)(t)  and  s(t  —  t^),  respectively,  in  the  expansions  of  the 
previous  theorem. 


2.  Modern  sampling  theory  using  frames 

To  take  advantage  of  the  full  power  of  frames,  we  drop  the  requirement 
in  the  previous  theorem  that  the  frame  be  exact.  Doing  so  creates  two 
obstacles.  The  first  is  whether  there  are  any  sequences  which  generate 
frames  of  exponentials  for  ^  which  are  not  exact.  The 

second  relates  to  the  analysis  of  the  inverse  frame  operator  S~'.  In  the 
previous  theorem,  we  used  the  biorthonormality  relation  described  in  the 
proposition.  This  relation  is  not  true  if  the  frame  is  not  exact.  Hence  we 
must  find  a  realization  of  the  inverse  frame  operator  that  is  both  useful  and 
applies  to  frames  which  are  not  necessarily  exact. 

To  answer  the  first  question  we  describe  the  work  of  Duffin  and  Scha¬ 
effer  and  the  work  of  Jaffard  on  this  topic. 

Definition  2.1.  A  sequence  j  is  uniformly  discrete  if  there  exists  a  constant 
d  such  that 

Vn-Ttm,  |tn-tm|$d>0. 

A  sequence  {t„ )  is  uniformly  dense  if  it  is  uniformly  discrete  and  there 
exist  constants  A,  L  >  0  such  that 

VneZ.  |t„-^|$L. 

A 

The  constant  A  is  called  the  uniform  density  for  such  sequences. 

Theorem  2.2  (14J).  If  [t,,]  has  uniform  density  A  >  0,  then  is  a  frame 
for  L^[— Q,n]  where  0  <  2n  <  A  and  where  Et„  =  n  e  Z. 

Theorem  2.3  ([11]).  The  sequence  (t„}  generates  a  frame  of  exponentials 
for  ( I )  where  I  is  an  interval  if  and  only  if  it  can  be  written  as  the  finite  union 
of  uniformly  discrete  subsequences  at  least  one  of  which  is  uniformly  dense. 
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Remark  2.4. 


1)  While  not  explicitly  indicated  in  Jaffard's  theorem,  there  is  a  relation¬ 
ship  between  the  length  of  I  and  the  uniform  density  of  all  uniformly 
dense  subsequences  of  [11], 

2)  The  completeness  radius  of  a  sequence  |tn }  is  the  supremum  over  all 
non-negative  real  numbers  O  such  that  {E,„Jis  complete  in  L’[-D,01. 
This  concept  has  a  long  history  as  the  reader  can  investigate  in  [2, 13, 
15, 16, 17],  The  result  of  Jaffard  above  arose  in  his  investigation  of  the 
concept  of  the  frame  radius,  that  is  the  supremum  over  all  non-negative 
real  numbers  O  such  that  {Ei„ ,  is  a  frame  in  L^[— Q,  Q]. 

3)  The  Duffin-Schaeffer  and  Jaffard  theorems  give  the  answer  to  the  first 
question  asked  above.  If  we  choose  a  sequence  {t^j  that  is  the  union 
of  a  uniformly  dense  subsequence  with  a  uniform  density  A  >  20, 
and  a  finite  number  of  uniformly  discrete  subsequences,  then  [Et„ is 
a  frame  for  [—0,0].  This  gives  a  sufficiently  rich  class  of  sequences 
for  us  to  investigate  the  existence  of  an  irregular  sampling  algorithm 
employing  them. 

4)  Uniformly  discrete  sequences  generate  upper  frame  bounds  for  sets 
of  exponentials.  Uniformly  dense  sequences  generate  upper  frame 
bounds,  as  they  are  uniformly  discrete,  but  also  lower  frame  bounds. 
For  the  frame  lEi,,}  mentioned  in  the  previous  remark,  explicit  esti¬ 
mates  for  the  upper  frame  bound  always  exists.  Using  the  work  of 
Plancherel  and  Polya  (20,  pp.  93-98]  one  can  show  [10,  Section  4.3]  that 
for  any  uniformly  discrete  set  |t„  the  upper  frame  bound  B  for  the  set 
of  exponentials  !Et„]in  L^[-D,01  exists  and  satisfies 


B  $ 


^2ni}d 

n^Od^ 


1) 


where  d  >  0  is  the  minimum  separation  between  sequence  points  it,i  |. 
If  [  t  n  1  is  the  union  of  a  finite  number  of  uniformly  discrete  subsequences 

[tf,;,  . . .,  (tl;),  then  the  upper  frame  bound  for  the  exponentials 
[Et„  I  in  n|  is  the  sum  B1+B2  +  ...  +  B1.  where  Bi, . .  .,Bv;  satisfy 

an  estimate  of  the  form  given  above  for  each  of  the  uniformly  discrete 
subsequences  [tj,],  {tf, }, ,.  .,lt^]. 

Uniformly  dense  sequences  (tn)  impose  lower  frame  bounds  on 
the  corresponding  set  of  exponentials  {Et„}  as  well  as  upper  frame 
bounds.  (Recall  that  uniformly  dense  sequences  are  also  uniformly 
discrete.)  The  lower  frame  bounds  are  also  additive  in  the  case  that 
the  sequence  (tn)  is  composed  of  a  finite  number  of  uniformly  dense 
subsequences.  However,  the  lower  frame  bound  is  highly  dependent 
on  the  distribution  of  the  points  and  the  density  of  the  uniformly  dense 
set.  No  simple  relationship  is  known  for  the  lower  frame  bound  of 
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a  uniformly  dense  set,  as  is  the  case  for  the  upper  frame  bound  for  a 
uniformly  discrete  set.  However,  in  certain  useful  cases  explicit  lower 
bounds  can  be  given,  as  will  be  described  latter  (see  Remark  3.3  (3)). 

5)  Note  that  in  the  definition  of  a  uniformly  dense  set,  the  value  of  L 
could  be  any  positive  number.  As  such,  it  is  possible  to  have  large 
gaps  in  the  sampling  lattice — i.e.,  places  where  the  distance  between 
consecutive  lattice  points  {tn}  is  large — by  taking  the  value  of  L  large 
enough.  This  is  an  advantage  of  the  frame  approach  as  compared 
to  other  approaches  to  irregular  sampling  (see  the  work  of  Karlheinz 
Grochenig  [7]  in  this  volume). 

To  deal  with  the  second  problem  associated  with  applying  non-exact 
frames  of  exponentials  to  sampling  problems — the  problem  of  analyzing  the 
inverse  frame  operator — wo  need  the  following  fact  about  the  frame  oper¬ 
ator.  This  proposition  represents  the  Neumann  expansion  for  the  inverse 
frame  operator. 

Proposition  2.5  ([4,  3]).  If  [g,,  |  c  H,  a  separable  Hilbert  space,  is  a  frame 
with  frame  bounds  A  $  B,  then 

VheH,  S-'(H)  =  —  5”  [l  -  — 1  (h). 

A  I  H  ^  I  A  I  I!  1 
k  0 

where 

i'  2S  11  H-A 

jl - I  5$  <  1. 

A  I  H  A  I  H 


Lemma  2.6.  Let  f,  Q  >  0  for  which  0  <  O  '  g'?r't'rate  a  frame  of 

exponentials  [Et,,  for  L'I-jt  ■  iVl-  Then 


Vf€  PWu,  f(t)  =  ^(f.S-'(R,J),^,d^(t-  t„) 

n 

where  the  series  converges  uniformly  on  tH  to  f  and  in  and  where 

the  coefficients  can  be  approximated  by  (infinite)  linear  combinations  of  the 
sample  values  by  taking  truncations  of  the  Neumann  series  for  S“ '  given  by 


c„  (f.S  '(l.J),^, 


k  0 


♦  H  ^  \  L  A  »  H 


I  rr ' 


Proof.  Applying  the  appropriate  reconstruction  formula  to  the  frame  of 
exponentials  {Et^ )  we  have 


vf  e  PWu, 
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The  first  conclusion  follows  by  applying  the  inversion  formula  to  this,  while 
the  uniform  convergence  follows  by  either  an  ad  vanced  calculus  argument  or 
as  in  [19].  The  second  conclusion  is  an  application  of  the  previous  proposition 
and  the  fact  that  S“’  is  self-adjoint.  I 

The  lemma  appears  to  have  nothing  to  do  with  sampling  as  the  sample 
values  do  not  appear  in  the  expansion  given  above.  However,  when  we 
analyze  the  coefficients  {Cnl  by  truncating  the  coefficient  expansions,  we  see 
that  the  sample  values  of  f  appear.  This  produces  the  following  algorithm. 


Algorithm  2.7.  We  obtain  an  irregular  sampling  algorithm  by  truncating  the 
Neumann  expansion  for  the  coefficients  at  various  places.  For  example,  if 
we  take  the  k  =  0  term  only,  we  have 


.Note  that  if  t„  =  nT,  n  T,  then  A  =  B  =  j,  and  so  c„  ^  Tf(nT), 
as  we  would  expect  from  the  Shannon  sampling  theorem  (Theorem  1.5).  In 
fact,  if  we  truncate  after  any  value  of  k  for  t„  =  nT,  n  €  Z,  with  A  — ■  B  =  ^ ' 
then  the  approximation  to  the  coefficients  is  exactly  f,,  H  t  „  .)  If  I  n  T  ),  so 
we  can  conclude  that  c„  -  Tf(nT)in  this  case. 

If  we  keep  only  the  k  •=  Oand  k  1  terms,  we  have 


Cti  ^  f  ( t  II  1 

A  1  H 

=--  f(t„| 

A  •  U 


A  }  H 


t|tn) 


!IX 

-  (  T  r  f(t,„)d.Ht„  -t,J. 

'  A  I  H  ^  ^  ' 


Again,  if  tn  =  nT,  n  €  Z,  then  A  -  B  =  and,  since  Td^  (t^  -  t,,, )  == 
in  this  ca.se,  we  have 


c„  %  2Tf(nT)  -  Tf(nT)  --  Tf(nT) 
as  we  claimed  above. 


Remark  2.8. 

1 )  As  we  take  larger  and  larger  values  of  k,  we  observe  that  the  computa¬ 
tion  of  the  approximations  to  the  coefficients  falls  into  a  pattern  that  is 
suitable  for  programming  on  a  computer. 
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2)  We  can  obtain  versions  of  this  algorithm  with  sampling  kernels  hav¬ 

ing  more  rapid  decay  than  d^  by  multiplying  both  sides  of  the  ex¬ 
pansion  in  the  proof  of  Lemma  2.6  by  a  function  s  t  with 

supp  s  c  f-yf .  2t1  ^  Y  t  1-0,  Ql,  as  described 

in  Remark  1.6  (3). 

3)  We  have  truncation  error  estimates  for  both  the  coefficient  expansions 
and  the  sampling  expansion.  See  [10,  Sections  4.3]. 

4)  One  could  also  consider  using  the  other  reconstruction  formula  in  the 
proof  of  Lemma  2.6,  as  the  two  reconstruction  formulas  produced  two 
different  sampling  formulas  when  we  assumed  the  Kadec-Levinson 
condition  (Theorem  1.10).  When  using  the  Neumann  expansion,  the 
two  reconstruction  formulas  do  in  fact  produce  the  same  sampling 
formula.  This  can  be  shown  by  an  induction  argument.  See  [10,  Theo¬ 
rem  4.3.1). 

5)  The  sampling  theory  presented  above  can  be  reproduced  using  Ga¬ 

bor  frames  (also  called  Weyl-Heisenberg,  or  weighted  Fourier,  frames). 
Gabor  frames  are  frames  for  the  .separable  Hilbert  space  1  com¬ 
posed  of  elements  of  the  form  -  naL,,.,,,.  c.  where  a,  b 

are  real  numbers  such  that  ab  s,  1  and  q  I  -('21 1  If  «.  b  and  g 
satisfy  certain  additional  a.ssumptions,  then  the  collection  above  will 
be  a  frame  for  [.‘•’('21)  (see  (9]).  As  indicated  in  (?)  and  !  U>.  Chapters 
2  and  3],  one  can  generalize  this  construction  to  allow  irregular  se¬ 
quences  it,„;  to  take  the  place  of  the  regular  lattice  inb  .  The  central 
ingredient  needed  to  accomplish  this  is  that  )  be  a  frame  of 

exponentials  for  G  .>  0.  The  Shannon  theorem,  the  Mio- 

Thomas  theorem  and  its  dual,  and  the  irregular  sampling  algorithm 
can  all  be  reproduced  using  Gabor  frames.  The  chief  advantage  of  this 
appproach  is  that  while  the  coefficients  c„  in  the  irregular  sampling 
algorithm  above  contain  the  slowly  decaying  factors  d-i  (t„  -  t,n  ),  the 
coefficients  in  the  Gabor  frame  construction  can  have  more  rapidly  de¬ 
caying  factors  s(tn  - 1,„ )  where  s  is  a  function  of  the  type  mentioned 
in  (2)  above  with  the  additional  assumption  that  .s  0  on  I  y  ,  ) 

This  allows  for  more  rapid  convergence  of  the  coefficient  expansions 
and  hence  better  numerical  performance. 


3.  Application 

As  an  application  of  these  ideas,  we  prove  the  following  theorem  of  Gabardo 
[6]  and  Walker  [18]  in  all  except  the  extreme  case.  As  a  bonus,  we  obtain  a 
sampling  theorem  for  entire  functions  of  exponential  type  O. 

Theorem  3.1  (Gabardo-Walker).  Let  f  €  L'^(tH)  with  supp  f  c  [-0,0], 
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Cl  >  0.  Assume  f(tn  )  =  0  V  n  6  2.  where  |t„  J  is  a  sequence  of  real  numbers 
satisfying 

1  )  t  rt  ^  t  , 

2)  limn  lx- tn  =  oo  and  lim„-,_oc  t,,  =  — oo; 

3)  sup,, it„  I  I  -  tni  =  B  <  oe. 

If  20B  $  1,  then  f  vanishes  identically. 


Proof  (G-W).  If  f  0,  then  Bernstein's  inequality  (20,  pp.  84,  86-87]  giv  es 
'Jf'lii  <27TD[!f;ij. 

On  the  other  hand,  by  Wirtinger's  inequality  [8,  p  184],  [5,  p.  47], 


I ...  1 


dt 


!t„  ,  I  t,d‘ 


f  I ..  .  1 


dt 


B- 

$  — > 

7t- 


since  f(t.,  I  ^  0  v  n.  Hence 


fiti'dt  y 


!f'it)l‘dt. 


t'lti  -  dt 


f'dl  ■  dt 


-  "'L 

ti 

rT‘ 

Combining  these  two  inequalities,  we  have 
B-’ 


f(t  li"  dt 


^''Itl^-’clt 


B-'|27tOl’ 


•fit  I  ‘  dt. 


If  2QB  <  1,  this  last  series  of  inequalities  is  impossible  Hence,''  0.  | 

Alternate  proof  (For2QB  •  I  only).  Let  t)i  >  ()  such  that  B  • 

j'j.  We  show  that  the  sequence  :t„  contains  a  stibsequi'nce  t„,  that 
generates  a  frame  of  exponentials  for  I  ’[  Oi.Oi],  and  hence  for  I  “i-ti,  ()! 
as  well. 

To  begin,  pick  e  >  0  small  enough  ,so  that  B  <  jitIti  ■  symmetric 
intervals  around  jtj  mT  ^  of  length  jd'Tf  “  ^  B  with  6  »  0  small.  Since 

sup(t„  ,  I  -  ‘m  1  -  B  ^  --- 
ncc.  20,  +  c 


6. 
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each  interval  centered  around  a  multiple  of  must  contain  at  least  one 

tn-  Discard  all  elements  except  one  of  the  sequence  {t„]  in  each  symmetric 
interval,  and  discard  all  elements  that  fall  between  the  intervals.  Label 
the  element  remaining  in  the  interval  centered  at  jnTT?  Then  the 

subsequence  [tnv, !  of  {tn’isauniformly  dense  sequence  with  uniform  density 
2£J)  +  e — i.e.,  •  t,,^ )  satisfies  Kth  -  t,,,,,  1  ^  6  >  0  for  k  /  m  and 


20, 


sj  1 


t  i 


20 ,  +  e 


Hence,  by  the  theorem  of  Duffin  and  Schaeffer  (Theorem  2.2),  the  collection 
of  exponentials  :t„  J  is  a  frame  for  I  ’f  -O,  .O,].  So,  by  applying  Algorithm 
2.7,  we  obtain  a  sampling  expansion  for  entire  functions  of  exponential  tvpe 
O  employing  the  sample  values  Itltn,, ),'.  I 


Counterexample  3.2.  We  show  by  example  that  if  we  let  20 B  =  1,  then  there 
exist  sequences  that  do  not  contain  a  subsequence  which  generates  a  frame 
of  exponentials.  Consider  the  sequence  defined  by  t„  =  for  n  C  C  and 


for  n.  >  0  odd 
for  n  >  0  even 

for  6  •>  0.  Consider  the  following  observations: 

1)  Observe  that  the  positively  indexed  terms  are  clustered  together  in 
pairs  —  an  oddly  indexed  term  and  its  following  evenly  indexed  term. 
The  two  elements  of  each  pair  get  infinitely  close  together  as  n  »  oo  as 
they  differ  by  where  n  is  even.  Hence  if  this  seq,ience  does  contain 
a  uniformlv  dense  subsequence,  a  subsequence  which,  bv  definition, 
is  al.so  uniformly  discrete,  then  only  a  finite  number  of  terms  of  this 
sub.sequence  can  be  taken  from  both  elements  of  these  pairs.  So  fc.r 
n  -0  large  enough,  then  at  most  one  element  can  come  from  each  of 
these  pairs  in  any  uniformly  dense  subsequence. 

2)  Note  that  if  a  subsequence  is  uniformly  dense,  the  subsequence  can 
grow  no  faster  or  slower  than  ^  since,  from  the  defintion  ol  a  uniformly 
dense  sequence. 


"  '  '  j.  \  V"  ' 
To  +  2_  c  . 


4^  ‘ 

1 


for  some  I  0. 
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We  claim  that  we  can  not  select  a  uniformly  dense  subsequence  from  !t„|, 
as  we  can  not  find  a  suitable  density  for  this  subsequence  that  satisfies  the 
growth  condition  (2).  This  is  because  any  subsequence  that  obeys  observa¬ 
tion  (11  will  grow  faster  than  the  sequence  {jrfrr '  for  e  ^  0  by  virtue  of  the 
jL  factor  added  to  the  terms  and  by  the  unboundedness  of  the  sequence 

5,  j,  * .  However,  any  uniformly  dense  subsequence  will  grow  slower 

than  ,  for  c  >  0  since  t„  ,  i  -  tn  =  B  =  jTj  for  n.  even.  | 


Remark  3  3. 

1)  In  the  alternate  proof  for  208  <  1,  we  selected  a  subsequence  !t„  J  of 

!  t„  such  that  !t„^^  1  generated  a  frame  of  exponentials  for  L^i— O  i ,  Q  il. 
By  applying  the  algorithm,  we  obtain  formulas  enabling  us  to  recon¬ 
struct  the  function  f  from  its  sample  values  at  the  points  i,  (The 
remaining  points  of  the  sequence  not  in  the  subsequence  '  1  can 

be  discarded,  or,  if  they  can  be  partitioned  into  a  finite  number  of  uni¬ 
formly  discrete  subsequences,  thev  can  be  incorporated  into  the  frame 
of  exponentials  generated  by  't„.  '.)  So  we  have  obtained  a  represen¬ 
tation  theorem  corresponding  to  the  uniqueness  theorem  in  the  case 
20B  ■  1. 

2)  \ote  also  that  the  work  of  Jaffard  and  Duffin  and  Schaeffer  allow  us 

to  produce  frames  of  exponentials,  and  hence  uniqueness  and  repre- 
sentatiim  theorems,  for  sequences  which  do  not  satisfy  the  restriction 
2{}B  ■  1.  In  particular,  since  in  the  definition  of  uniformly  dense 

sequences  I  can  be  any  positive  number,  we  can  generate  uniformly 
dense  sequences  with  large  gaps — i.e.,  for  which  the  distance  between 
certain  consecutive  points  is  larger  than  Jj.  Hence  we  can  extend 
the  Gabardo-Walker  Theorem  to  irregular  sampling  lattices  that  do  not 
satisty  all  of  the  restrictions  of  the  hypotheses  of  that  theorem. 

3)  It  can  be  shown  [  10,  Corollary  4.4.4|  that  the  lower  frame  bound  for  the 

frame  of  exponentials  in  I  "(-0|,0|1  generated  in  tiie  alternate  proof 
is  "  '  .  (See  also  Remark  2.4  (4).) 

4)  I'or  another  approach  to  irregular  sampling,  see  the  paper  by  Karlheinz 

C.rdchenig  [7]  in  this  volume.  The  method  described  there,  and  in  the 
joint  papers  with  Hans  Feichtinger  listed  in  the  references  of  that  paper, 
applies  in  a  wide  v,irii>iy  of  function  spaces  on  various  groups.  This  is 
to  be  contrasted  with  the  method  described  here  which  applies  only  in 
I  n  I. 
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4.  Notation 


The  Fourier  transform  f  of  f  6  L’  (iH)  is  defined  as 


f(Y)  = 


where  "J"  designates  integration  over  the  real  line  iR;  f  is  defined  on  fR  ( ==  fR ) 
and  is  the  inverse  Fourier  transform  of  f.  The  Fourier  transform  is  defined 
on  L^(iR),  and,  for  fixed  D  >  0, 


PWa  S  [f  e  L^(iR)  :suppf  C  [-0,0)1, 


where  supp  f  is  the  support  of  f.  Functions  that  are  in  the  space  arc 
called  Cl-bandlimited. 

Besides  the  L'’(iR)-spaces ,  we  deal  with  the  space  C'“(iR)  of  infinitely 
differentiable  functions  and  its  subspace  C^(fR)  whose  elements  have  com¬ 
pact  support. 

"  designates  summation  over  the  whole  discrete  group  in  question, 

e. g.,  over  Z  where  Z  is  the  group  of  integers.  The  function  l.s  is  the  character¬ 
istic  function  of  S  C  fR,  |S|  is  the  Lebesgue  measure  of  S,  and  laj,  = 

The  function  6mn  is  defined  as  0  if  m  /  n  and  as  1  if  m  =  n.  The  dilation 

f, \  of  the  function  f  is  f\(t)  •-  Af(At).  Finally,  the  exponential  function  E,,  Es 


5.  Bibliography 

[1]  J.  Benedetto  and  W.  Heller.  Irregular  sampling  and  the  theory  of  frames, 
I.  Note  Mat.,  1991. 

[2]  A.  Beurling  and  P.  Malliavin.  On  the  closure  of  characters  and  the  zeros 
of  entire  functions.  Acta  Math.,  1 18:79-93, 1967. 

[3]  I.  Daubechies.  The  wavelet  transform,  time-frequency  localization,  and 
signal  analysis.  IEEE  Trans.  Inform.  Theo/y,  36:961-1005, 1990. 

[4]  R.  Duffin  and  A.  Schaeffer.  A  class  of  nonharmonic  Fourier  series.  Trans. 
Am.  Math.  Soc.,  72:341-366, 1952. 

[5]  H.  Dym  and  H.  P.  Mckean.  Fourier  series  and  integrals.  Academic 
Press,  New  York,  1972. 

[6]  J.-P.  Gabardo.  Spectral  gaps  and  uniqueness  problems  in  Fourier  anal¬ 
ysis.  PhD  thesis.  University  of  Maryland,  College  Park,  MD,  1987. 

17]  Karlheinz  Grdchenig.  Sharp  results  on  random  sampling  of  bandlimited 
functions.  In  J.S.  Byrnes,  Jennifer  L.  Byrnes,  Karl  Berry,  and  Kathryn  A. 


{  Heller 


116  } 

Hargreaves,  editors.  Probabilistic  and  Stochastic  Methods  in  Analysis, 
with  Applications:  Proceedings  of  the  NATO  Advanced  Study  Institute, 
NATO  ASI  series  C:  Mathematical  and  physical  sciences,  Dordrecht, 
Boston,  London,  1992.  Kluwer  Academic  Publishers  Group.  Held  at  II 
Ciocco  Resort,  Tuscany,  Italy. 

[8]  G.  H.  Hardy,  J.  E.  Littlewood,  and  G.  Polya.  Inequalities.  Cambridge 
University  Press,  1952. 

[9]  C.  Heil  and  D.  Walnut.  Continuous  and  discrete  wavelet  transforms. 
SIAM  Review,  31 :628-666, 1989. 

[10]  W.  Heller.  Frames  of  exponentials  and  applications.  PhD  thesis.  Uni¬ 
versity  of  Maryland,  College  Park,  MD,  1991 . 

[11]  S.jaffard.  A  density  criterion  for  frames  of  complex  exponentials.  Mich. 

].  of  Math.,  38:339-348, 1991. 

[12]  M.  Kadec.  The  exact  value  of  the  Paley-Wiener  constant.  Sov.  Math. 
Dokl.,  5:559-561, 1964. 

[13]  H.  Landau.  Necessary  density  conditions  for  sampling  and  interpola¬ 
tion  of  certain  entire  functions.  Acta  Math.,  117:37-52, 1967. 

[14]  B.  ja.  Levin.  Distribution  of  zeros  of  entire  functions,  volume  5  of  Trans, 
of  Math.  Monographs.  Am.  Math.  Soc.,  Providence  RI,  rev.  edition,  1980. 

[15]  N.  Levinson.  Gap  and  density  theorems,  volume  26  of  Am.  Math.  Soc. 
Colloq.  Publ.  Am.  Math.  Soc.,  Providence  RI,  1940. 

[16]  R.  Paley  and  N.  Wiener.  Fourier  transforms  in  the  complex  domain, 
volume  19  of  Am.  Math.  Soc.  Colloq.  Publ.  Am.  Math.  Soc.,  Providence, 

RI,  1934. 

[17]  R.  Redheffer.  Completeness  of  sets  of  complex  exponentials.  Advances 
in  Math.,  24:1-62,  1977. 

[18]  W.  Walker.  The  separation  of  zeros  for  entire  functions  of  exponential 
type.  /.  of  Math.  Analysis  and  Appl,  122:257-259,  1987. 

[19]  K.  Yao  and  J.  Thomas.  On  some  stability  and  interpolatory  properties 
of  nonuniform  sampling  expansions.  IEEE  Trans.  Circuit  Theory,  CT- 
14:404-^08,  1967. 

[20]  R.  Young.  An  introduction  to  nonharmonic  Fourier  series.  Academic 
Press,  New  York,  1980. 


Stationary  frames  and  spectral  estimation f 

John  J.  Benedetto 

Department  of  Mathematics 

University  of  Maryland 

College  Park,  Maryland  20742  USA 

j  jb@math. umd. edu 

Also  at  Prometheus  Inc. 


Dedicated  to  Professor  George  Maltese  on  the  occa.--ion  of  his  sixtieth  birthday. 

> 

g 

i  Kolmogorov's  fundamental  paper  on  stationary  sequences  (1941)  played 
a  major  role  in  important  problems  dealing  with  stochastic  priKesses.  His 
results  are  reviewed  here  in  the  context  of  their  relations  with  three  topics 
in  harmonic  analysis.  The  topics  are  weighted  Fourier  transform  norm 
inequalities,  stationary  frames,  and  Wiener-Plancherel  formulas. 

Kolmogorov's  prediction  theory  lead  to  weighted  Hilbert  transform 
inequalities  which,  in  turn,  are  characterized  by  Ap-weights.  These  weights 
identify  a  special  collection  of  weighted  Fourier  transform  inequalities,  in¬ 
cluding  results  of  Hardy,  Littlewood,  and  Paley.  Tire  extension  to  more 
general  Fourier  transform  inequalities  leads  to  restriction  theorems  and  un¬ 
certainty  principle  inequalities. 

Stationary  frames  establish  a  conceptual  distinction  between  wavelets 
and  coherent  states.  They  are  developed  from  Kolmogorov's  spectral  char¬ 
acterization  of  minimal  sequences. 

Wiener-Plancherel  formulas  are  used  in  the  spectral  estimation  asso¬ 
ciated  with  the  mathematical  setting  of  both  Kolmogorov's  and  Wiener's 
prediction  theories. 
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1.  Introduction 

Fifty  years  ago,  in  1941,  Kolmogorov  published  his  monumental  paper.  Sta¬ 
tionary  sequences  in  Hilbert  space  [39].  As  Cramer  pointed  out  [17,  p.  532], 

"The  fundamental  importance  of  this  work  by  Kolmogorov  lies  in 
the  fact  that  he  showed  how  the  abstract  theory  of  Hilbert  space  (as 
well,  of  course,  as  of  other  types  of  spaces)  could  be  applied  to  the 
theory  of  random  variables  and  stochastic  processes." 

Moreover,  in  [39]  and  its  sequel  Interpolation  and  extrapolation  of  stationary 
random  sequences  (1941),  Kolmogorov  introduced  the  basic  concepts  of 
deterministic  and  purely  nondeterministic  stationary  sequences,  and  posed 
and  solved  the  primary  problems  in 

A:  Prediction  theory 

B:  Spectral  theory  of  minimal  stationary  sequences. 

The  setting  for  these  two  areas  is  based  on  the 
C.  Wiener-Khinchin  theorem. 

From  the  point  of  view  of  stationarity,  the  wonderful  and  influential  ideas 
formulated  in  [39]  are  now  standard  fare  in  probability  theory,  and  to  some 
extent  they  have  been  played-out,  especially  in  the  (multivariate)  discrete 
semi-infinite  prediction  theoretic  case,  e.g.,  [43],  [47],  [53,  Volume  HI,  in¬ 
cluding  the  updates  by  Masani  (pp.  276-306),  Salehi  (pp.  307-338),  Muhly 
(pp.  339-370),  and  Kallianpur  (pp.  402-424)],  cf.  [20).  There  is  still  a  great 
deal  to  be  done  in  the  case  of  stationary  fields,  e.g.,  [15],  [37],  and  Section  4.2 
Our  goal  in  Sections  3-5  is  to  describe  recent  results  from  three  topics  of 
modern  harmonic  analysis  which  are  in  the  intellectual  lineage  of  the  above 
items  A,  B,  and  C,  respectively.  A  will  lead  to  the  topic  of  weighted  Fourier 
transform  norm  inequalities,  B  to  a  topic  in  wavelet  and  coherent-states 
theory,  and  C  to  multidimensional  Wiener-Plancherel  theorems. 

Section  2  is  devoted  to  a  commentary  on  parts  of  [39],  and  we  have  re¬ 
sisted  the  temptation  to  record  much  subsequent  related  material  on  stochas¬ 
tic  processes  and  prediction  theory.  We  have  lectured  on  the  relation  between 
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prediction  theory  and  weighted  Fourier  transform  norm  inequalities  since 
the  early  1980's:  and  most  of  the  material  in  Section  3  is  taken  from  those 
lectures.  Section  4  records  some  of  the  preliminary  ideas  being  used  in  our 
current  work  on  stationary  frames.  Section  5  rounds  out  our  view  of  the 
type  of  harmonic  analysis  affected  by  [39];  the  format  in  Section  5  is  just  to 
state  our  recent  published  results  [6]. 

Besides  the  usual  notation  in  analysis  as  found  in  the  books  by 
Hormander  [35],  Schwartz  [49],  and  Stein  and  Weiss  [50],  we  shall  use  the 
conventions  and  notation  described  at  the  end  of  the  paper. 

2.  Kolmogorov  and  stationary  sequences 

2.1.  The  Wiener-Khinchin  theorem 

Definition  2.1.  A  sequence  (x(n)  :  n  6  Z}  in  a  complex  Hilbert  space  H  is 
stationary  if  the  inner  product 

R(n)  =  Rxx(n.)  =  (x(n  +  k),x(k)),  n  €  2., 

is  independent  of  k.  Rxx  is  the  autocorrelation  of  x.  Two  stationary  sequences 
{x(n)]  and  {y  (nj)  are  stationarily  correlated  if  the  inner  product, 

Rxy(n)  =  (x(n  +  k),y(k)>,  n  t  2, 

is  independent  of  k.  Clearly, 

Vn€2,  Rxy(n)  =  Rxy(-n). 

Theorem  2.2  (Wiener-Khinchin). 

1)  Given  a  stationary  sequence  [x(n)l  c  H,  there  is  y  t  M  ,  (T)  for  which 

Rxx  =  y  is  the  power  spectrum  of  x,  cf.  Definition  5.9. 

2)  Given  y  €  M,  ('X),  there  is  a  stationary  sequence  [x(n)l  for  which 
Rxx  =  y^. 


Remark  2.3. 

1)  Item  1  of  Theorem  2. 2  is  immediate  since  Rxx  is  positive  definite,  thereby 
allowing  us  to  apply  Herglotz's  theorem. 

2)  For  Item  2  of  Theorem  2.2,  we  first  note  that  y'^  =  R  is  positive  definite. 
For  the  case  of  stationary  stochastic  processes  x,  e.g.,  [45],  the  problem 
is  to  construct  x  for  which  Rxx  =  R  This  was  done  by  Khinchin  (1934) 
on  SR,  by  Wold  (1938)  on  2,  and  in  a  more  general  setting  on  both 
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Di  and  Z  by  Cramer  (1940)  [16,  p.  224];  the  method  is  standard,  e.g., 
[20,  pp.  62-63  and  pp.  72-73],  [46,  pp.  221-222],  and  the  constructed 
process  is  Gaussian.  In  fact,  in  this  Gaussian  case  there  is  a  natural 
bijection  between  (X)  (or  Mb+(W))  and  stationary  Gaussian  pro¬ 
cesses  subjected  to  a  mild  technical  constraint.  The  argument  in  [20] 
uses  the  Kolmogorov  extension  theorem  found  in  his  classical  book 
(1933).  This  should  be  compared  with  Kolmogorov's  abstract  "Hermi- 
tian  extension"  method  [39,  Lemma  1  and  Lemma  2],  which  he  used 
to  prove  Item  2  of  Theorem  2.2  and  its  generalization  for  stationarily 
correlated  sequences,  viz.,  [39,  Theorems  4, 5,  and  6].  A  footnote  in  [16, 
p.  221  ]  as  well  as  a  reference  in  [39]  indicate  that  both  Cramer  and  Kol¬ 
mogorov  were  aware  of  the  other's  similar  results,  which  were  proved 
by  different  methods  and  resulted  in  more  generality  in  [39]. 

3)  Wiener's  contribution  (1930)  to  the  Wiener-Khinchin  theorem  was  for¬ 
mulated  in  nonprobabilistic  terms,  cf.  [27];  and  lead  to  the  constructive 
Wiener-Wintner  theorem  (1939)  on  91.  Bass  and  Bertrandias  made  sig¬ 
nificant  contributions  to  this  result;  and  recently  my  student,  R.  Kerby, 
and  1  have  proven  the  Wiener-Wintner  theorem  in  91^*.  One  basic  con¬ 
struction  is  given  in  [2]  and  two  others,  which  are  quite  ingenious,  are 
contained  in  [38]. 

Our  result  is 

Theorem  2.4  (Wiener-Wintner).  Given  h  €  Mbi  1?!“'),  there  is  a  con- 
structible  function  x  €  Luu  ^  (t^‘'  1  such  that,  for  all  t, 

3R|t)=  lim  -  x(t  +  u)x(u)du  (2.1) 

1  -"-v  IdIO,  I  )|  JhiO.I  1 

and 

R(t)  =  n^(t). 

The  ordinary  point  function  R  in  (2.1)  and  its  probabilistic  counter¬ 
part  Rx,  defined  by  a  stationary  stochastic  process  x(t,c<|,  are  essentially 
equivalent  in  correlation  ergodic  processes,  e.g.,  [45];  the  role  of  Theorem  2.4 
in  spectral  estimation  is  discussed  in  [2],  [5,  Section  5],  and  Section  5.4  of 
this  paper.  Given  x  €  Li,)t’(51'')  and  R  defined  by  (2.1);  the  converse  of 
Theorem  2.4  is  immediate  by  Bochner's  theorem. 

2.2.  The  fundamental  isometry  and  structure  theorems 

In  his  work  of  1941,  as  well  as  in  an  earlier  Comptes  Rendus  note  (1939), 
Kolmogorov  solved  the  problem  of  predicting  the  future  from  the  whole  past, 
cf.  Item  1  of  Remark  2.10.  The  following  elementary  observation  plays  a  role 
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in  this  solution  by  transferring  a  large  class  of  statistical  prediction  problems 
into  problems  of  trigonometric  approximation  in  weighted  Lebesgue  spaces. 

Theorem  2.5.  Given  a  stationary  sequence  {x(n.)}  c  H,  with  power  spectrum 
H.  Let  H(x)  =  ^[x(n)}.  The  mapping, 

Z:Lj(X)-^H(x), 

defined  linearly  on  sp{e'^"'''^l  by  =  x(n),  extends  to  a  linear 

isometric  isomorphism,  cf.  (39,  Lemma  4  and  its  proof  in  terms  of  the  spectral 
representation  of  the  shift  operator  li]. 


Proof.  Since  u  is  a  bounded  Radon  measure  we  have  that  e  Lf,  (X). 

For  finite  sums  p(y)  =  Y.  €  L^,(T)  and  x  =  ^  Cnx(n.)  €  H(x),  we 

compute 

IIpIU,!.  =  II  y"  CmCnn'^Irn  -  n) 

liz.n 

m.n 

=  CmCnR(m  -  n)  =  ^  c,„Cn(x(m  -  n  +  l<),x(k)) 

m.n  tn.n 

=  ^  c,„Cn(x(m).x(n,))  =  c„x(n),^  c„x(n)^ 

m .  n 

and  so  Z  is  an  isometry  on  splc^”''’^;. 

Next,  we  .see  that  splc^”'"'*'!  =  L^,('X).  In  fact,  for  a  given  e  ,■>  0 
and  f  €  L^, (  X),  there  is  a  continuous  function  g  on  X  and  a  trigonometric 
polynomial  p  such  that 


||f  -  g||2.v,  <  e/2  and  |[g-p||x:<  — ;. 

2|Ih||!'‘ 


Consequently, 


IK-p|(2,i.  <  ^  ^  1(C)  -p||2,„ 


< 


dluKy) 


Thus,  by  general  considerations,  Z  is  a  linear  isometry  from  L^, (X)  onto  a 
dense  subspace  of  H(x).  In  particular,  Z  is  injective.  Finally,  taking  y  6  H(x) 
and  using  the  Cauchy  sequences,  [ynl  Q  H(x)and  [Z^'y^]  c  Lf,(X),  where 
lim  yn  =  y  and  lim  Z~'y„  =  f,  it  is  easy  to  check  that  Zf  =  y;  and  so  Z  is  a 
surjection  onto  H(x).  | 
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The  verification  in  the  proof  of  Theorem  2.5,  that  =  L^(T), 

also  works  for  stationary  functions, 

(2.2) 

i.e.,  functions  (2.2)  for  which  (x(t  +  s),x(s))  is  independent  of  s.  In  fact,  we 
take  g  G  Cc  (IK^* ),  with  supp  g  contained  in  a  cube  Q,  and  choose  pig  instead 
of  p.  Another  elementary  proof  that 

sp{g27titY.tg9t<i}  =  LfjiH^), 

where  the  power  spectrum  px  =  P  is  an  element  of  Mb  i  (iH'* ),  utilizes  the 
Hahn-Banach  theorem  instead  of  the  Weierstrass  approximation  theorem. 
In  this  case,  the  argument  is  completed  by  the  uniqueness  theorem  for  the 
Fourier  transform. 

The  structure  of  bounded  measures  on  (  or  X)  is  given  by 
Theorem  2.6.  Each  p  €  Mb(IH)  can  be  written  in  the  form 
P  =  fac  +  Ps  =  fac  +  Psc  +  Pd. 

where  fat  €  L'  (IH),  ps  is  the  singular  part  of  p,  psc  €  Mb(JH)  is  designated 
the  continuous  singular  part  of  p,  and  p<i  =  X!  €  Mb('^),  where 
XI  idyl  <  00.  Further, 

P  =  F', 

the  distributional  derivative  of  a  function  of  bounded  variation  (BV),  and 

F  =  Fat  +■  Fst  +  ^  dyHy, 

where  Fyt  6  BV  is  locally  absolutely  continuous,  Fst  €  BV  is  a  continuous 
function  whose  ordinary  derivative  vanishes  a.e.,  and  Hy  is  the  Heaviside 
function  with  jump  at  y.  Finally, 

F'at  =  feu  .  F;,  =  U,t.  (X.  dyHy)'  =  XI  dy6y. 

under  distributional  differentiation. 


2.3.  The  Wold  decomposition  and  deterministic  sequences 
Definition  2.7. 

1)  Given  a  nonzero  stationary  sequence  {x(n)l  C  H.  Besides  the  notation 
H(x)  =  sp{x(n)l  =  H(x,oo), 


I 
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defined  in  Theorem  2.5  and  which  does  not  depend  on  stationarity,  we 
now  define  the  closed  subspaces, 

Vn,  H(x,  n)  =  sp{x(k) :  Ic  ^  n) 

and 

H(x,  — oo)  =  nH(x,n). 

Clearly, 

Vn,  H(x, -oo)  C  H(x,n)  C  H(x,oo). 

•  [x(n))  is  determinist/c  if 

H(x,— oo)  =  H(x,oo); 

•  {x(n)l  is  nondeterministic  if 

H(x,— oo)  f-  H(x,oo); 

•  |x(n)l  is  purely  nondeterministic  it 

H(x,-oo)  =  ;0)  (y^Hlx.ool). 


2)  In  1941,  Kolmogorov  was  aware  of  the  Wold  decomposition  (1938), 
whereas  Wiener,  in  his  independent  development  of  prediction  theory 
was  not,  e  g.,  (44,  p.  193).  Because  of  the  role  of  Wold's  result  in  [39], 
we  state  the  Wold  decomposition,  which,  notwithstanding  its  origins, 
is  a  theorem  about  operators  on  a  Hilbert  space  and  nondeterministic 
sequences  from  Kolmogorov's  point  of  view. 

Let  !x(n)!  be  a  nondeterministic  stationary  sequence  with  shift 
operator  U  =  Ux  on  H(x,oo)  defined  by  U(x(n))  =  x(n  +  1 )  on  lx(n)]. 
Then  there  are  stationary  sequences  lu(n)!  and  (v(n)},  and  a  unique 
decomposition, 

Vn,  x(n)  =  u(n)  +  v(n), 
such  that 

a)  !u(n)i  is  purely  nondeterministic  and  [v(n)j  is  deterministic, 

b)  Vn,  H(u,n)  U  Hfv,n)  C  H(x,n), 

c)  H(u,oo)  1  H(v,oo), 

d)  Vn,  v(n)  is  the  projection  of  !x(n)J  onto  H(x,  -oo). 
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3)  Given  the  hypotheses  of  the  Wold  decomposition  in  Item  2.  Since 
{x(n)}  is  nondeterministic,  H(x,0)  /  H(x,  1).  There  is  an  essen¬ 
tially  unique  unit  vector  z(\)  e  H(x.  1)  such  that  z(l)  1  H(x,0)  and 
sp{z(I ),  H(x,0)j  =  H(x,  1 ).  Noting  that  U  extends  to  a  unitary  operator 
on  H(x,oo)and  that  U“'  is  the  adjoint  operator,  we  define 

Vk€Z,  z(k)  =  U'‘-'(z(I)). 

By  U's  definition  it  is  easy  to  check  that  [z(kj}  is  orthonormal  and  that 

VkeZ,  z(k)  1  H(x.k- 1).  (2.3) 

Writing  the  Fourier  expansion  of  x(  1 )  with  respect  to  ',z(k)',  we  compute 

OC 

x(l)  =  ^CKZ(1 -k)  +  v(l),  (2.4) 

k  0 

noting  that  Cv.  =  (x(l  ),z(l  -  k))  =  0  for  k  <  0  by  (2.3).  We  have  (Ck ,  t 
l'^(2.)  and  can  verify  that  v(l)  €  H(x, -oo).  Applying  the  operator 
U''-’  to  (2.4)  we  have 

n 

Vn,  x(n)  =  ^  Cn-kZ(k)  4- v(n). 

k  -■  -x 

and,  in  particular,  the  purely  nondeterministic  sequence  |u(n)'  is  a 
particular  moving  average,  niz., 

n 

Vn,  u(n|  =  c„.-kz(k|.  (2.5) 

k  —  ?c 

The  Wold  decomposition  is  equivalent  to  the  power  spectral  decom¬ 
position  of  u<  into  its  absolutely  continuous  and  singular  parts.  For  ex¬ 
ample,  if  logfiu  €  L'fT),  where  Ux  f«i  +  Us,  cf.  Theorem  2.6.  then 
{u(n)l  corresponds  to  f.^  and  [v(n)l  corresponds  to  m,  cf.  Theorem  2.8. 
If  log  f<, I  i  L’  (  T),  where  Ux  =  ft.,  4  Us,  then  lv(n)l  corresponds  to  all  of  Ux. 
This  material  is  well-traveled  and  there  are  many  points  of  view,  e  g.,  [20, 
Chapter  4],  [27,  pp.  259-261],  [42,  pp.  62ff],[46,  pp.  735-759].  We  shall  exposit 
Kolmogorov's  original  formulation  from  [39,  Sections  8-10],  which  he  points 
out  is  "more  unfamiliar  (than  Sections  1-7)  and  seems  to  be  really  new." 
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2.4.  Spectral  characterizations  of  deterministic  sequences 

Theorem  2.8.  [39,  Theorem  22]  Given  a  stationary  sequence  {x(n)}  C  H 
with  power  spectrum  4.  [x(n)}  is  purely  nondeterministic  if  and  only  if 
4  =  f<.,  €  L'(T)  and 

logfac  e  L'(T). 

(In  particular,  4  >  Oa.e.  and  supp  4  =  X.) 

One  of  the  features  of  such  a  result  is  that,  when  we  know  the  autocor¬ 
relation  of  a  process  (which  is  often  experimentally  available),  we  can  char¬ 
acterize  t/  e  prediction  theoretic  properties  of  the  underlying  process.  There 
are  analogous  results  for  related  filter  problems,  e.g.,  [3,  Theorem  IV.2.1  ]  and 
the  thesis  [55]  of  my  student,  G.  Yang. 

Theorem  2.9.  [39,  Theorem  23]  Given  a  stationary  sequence  [x(n)l  C  H  with 
power  spectrum  q  —  foe  +  fis- 

1)  If  fuc  =  0  on  a  set  of  positive  Lebesgue  measure  then  Ix(n)l  is  deter¬ 
ministic. 

2)  If  fat  >  Oa.e.  and  log  («,  i  L'  (I)  then  [x(n)!  is  deterministic. 

3)  If  f„t  >  Oa.e.  and  log  f<u  €  L'('X)  then  [xlnll  is  nondeterministic, 
cf.  Item  2. 


Remark  2.10. 

1)  Using  the  definition  of  a  deterministic  stationary  sequence  as  well  as 
the  isometric  isomorphism  in  Theorem  2.5,  we  can  rewrite  Theorem  2.9 
in  terms  of  trigonometric  approximation  as  follows: 


Given  4  =  -t-  4^  €  M  ,  (X). 

gp;e2niky  ^  0’  =Lf,(X) 

(2.6) 

if  and  only  if 

logf„,  i  L'(X). 

(2.7) 

Kolmogorov's  proof  was  a  consequence  of  the  Szego  alternative,  and 
there  is  an  elementary  presentation  of  this  proof  in  [1,  pp.  261-263]. 
The  completeness  statement  (2.6)  is  the  analytic  formulation  of  the 
prediction  theoretic  statement,  concerning  prediction  of  the  future  from 
the  whole  past,  which  we  made  prior  to  Theorem  2.5.  The  result  can 
first  be  proved  for  4  =  fac,  and  the  "reduction"  from  arbitrary  4  to  fm 
uses  the  F.  and  M.  Riesz  Theorem. 
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The  analogous  result  for  4  e  Mbf  (W)  and  is  due  to  Krein 

(1945).  In  this  case,  "k  $  0"  is  replaced  by  "t  $  0"  in  (2.6),  "X"  is 
replaced  by  "91",  and  (2.7)  is  replaced  by  the  condition. 


|logfoc(y)l 

1 


dy  =  00. 


2)  The  relation  of  Theorem  2.9  (as  written  in  Item  1)  to  Wiener's  Tauberian 
theorem,  Beurling's  spectral  analysis,  the  Denjoy-Carleman  theorem 
on  quasi-analytic  classes,  and  Harry  Pollard's  solution  (1955)  of  the 
Bernstein  approximation  problem  is  discussed  in  [4],  Pollard's  basic 
lemma  on  entire  functions  of  exponential  type  was  used  by  (his  student) 
de  Branges  to  prove  uniqueness  criteria  in  the  spirit  of  work  by  the 
Rieszes,  Levinson,  and  Beurling-Malliavin  [4].  A  deep,  novel,  and 
applicable  distributional  analysis  of  this  latter  body  of  work  is  due  to 
my  student,  J.-P.  Gabardo  [25,  22, 24]. 


2.5.  Spectral  properties  of  minimal  sequences 

The  final  notion,  which  we  wish  to  discuss  and  that  was  introduced  by 

Kolmogorov  in  [39],  is  the  following: 

Definition  2.11. 

1)  Given  a  sequence  !x(n);  c  H  and  define  the  closed  subspace, 

H(x,n.)  s  sp;x(k) :  k  /  n!. 

[x(n)l  is  minimal  if 

Vn,  x(n)  ^  H(x,n).  (2.8) 

In  the  case  of  stochastic  processes,  minimal  sequences  are  "those 
for  which  the  random  function  at  ai\y  time  (t  =  n)  is  outside  the  closed 
subspace  spanru  d  by  the  past  and  future  functions  of  the  proce.ss"  [43, 
pp.  141-142). 

2)  If  {x(n)jisastationarysequencethen either  H(x,h)  =  H(x,ool  forall  n 
orH(x,h)  g  H(x,  CO )  forall  n.  For  example,  suppose  H(x,  n)  ==  H(x,oo) 
and  m  n.  If  H(x,  m)  ^  H(x.oo)  then 

Vk  51^  0,  (x(Tn),x(m  +  k))  =  0. 


so  that  by  stationarity, 

Vk  /  0,  (x(n.).x(n  +  k))  =  0. 


(2.9) 
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By  hypothesis,  (2.9)  implies  that  x(n)  =  0,  a  contradiction,  Thus,  for 
stationary  sequences,  the  criterion  (2.8)  for  minimality  can  be  replaced 
by  the  condition, 

H(x.O)  ^  H(x,oo).  (2.10) 

3)  Minimality  not  only  plays  a  natural  role  in  prediction  theory,  but  is  an 
essential  aspect  of  Kdthe's  theorem  (1936)  characterizing  Riesz  bases. 
Kothe's  theorem  and  its  role  in  irregular  sampling  constitute  recent 
results  with  my  student,  W.  Heller  [11]. 


Theorem  2.12.  [39,  Theorem  24]  Given  a  stationary  sequence  lx(n)l  >  H, 
with  power  spectrum  p  =  fm-  +  Ps-  [x(n)J  is  minimal  if  and  only  if 

f-’  €  L '(■!,, 

cf.  the  multivariate  version  2-  ->  K**  in  [43,  Theorem  2.8],  [47]. 

Theorem  2. 12  has  had  important  modifications  (even  in  the  case  2  — >  H) 
due  to  Masani  [43]  and  Rozanov  [47],  e.g..  Theorem  2.17;  and  these  have 
stimulated  and  affected  our  observations  in  Section  4  concerning  topic  B. 

Definition  2.13. 

1)  Given  a  separable  complex  Hilbert  space  H.  Two  sequences  ,x(n)l, 
(y(n)l  C  H  are  biorthonormal  if 

ym.n,  (x(m),vj(n)i 

2)  Given  a  sequence  lx(n)I  C  H.  An  Hahn-Banach  argument  shows  that 
there  is  a  sequence  [yin))  T  H  .so  that  |x(n)!,  |y(n);  are  biorthonormal  if 
and  only  if  [x(n)J  is  minimal.  Furthermore,  'y(n))  is  uniquely  determined 
if  and  only  If  ,'x(n)',  is  not  only  minimal  in  H  but  also  sp(x(nli  -  M. 

Using  the  fact  stated  in  Item  2  of  Definition  2.1  we  can  make  the  fol¬ 
lowing  definition. 

Definition  2.14. 

1)  Given  a  minimal,  complete  sequence  [x(n)l  C  H,  and  let  [xln.)),  (y(n)) 
be  biorthonormal.  [x(n)|  is  a  Bessel  sequence  if  the  Bessel  map, 

B  :H  -»  1^(2) 

X  •  >  :(x,y(n))l, 


(2.11) 
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is  a  well-defined  linear  map.  [x(n))  is  a  Hilbert  sequence  if 

V[c(n)l  €  1^(2.),  3xi  t  H  such  that 
Vn,  c(n,)  =  (xc.y(n)). 

Clearly,  a  Bessel  sequence  is  a  Hilbert  sequence  if  and  only  if  the  Bessel 
map  B  is  surjective. 

2)  If  (x(n)J  is  a  Bessel  sequence  then,  by  the  uniform  boundedness  prin¬ 
ciple,  there  is  a  constant  B  >  0  such  that 

VxtH,  V  |(x,y(n))|^  $  B||x!|^.  12.12) 

Thus,  the  map  B  in  (2.1 1 )  is  not  only  well-defined  and  linear  hut  also 
continuous. 

If  ;x(tc)]  c  H  is  a  minimal,  complete  sequence,  and  !x(n)l,  ',y(n)  2  H 
are  biorthonormal  then  it  is  not  necessarily  true  that  ^(yln)!  =  H;  an  old 
example  of  Kaezmarz  and  Steinhaus  (1935)  provides  a  counterexample,  e.g., 
[34,  pp.  19-20).  On  the  other  hand,  Masani  [43]  and  Rozanov  [47]  have  used 
Theorem  2.5  to  observe  the  following  lemma. 

Lemma  2.15.  Given  a  stationary,  minimal,  complete  sequence  |x(n)!  2  H, 
and  let  ;x|n)I,  !y (n)!  be  biorthono''mal.  Thensp'y(n);  =  11. 

Theorem  2.16.  Given  a  stationary,  minimal,  complete  sequence  !x(ni;  2  M, 
and  let  !x(ni;,  Iy(n):  be  biorthonormal.  Assume  (xln)!  is  both  a  Bessel 
sequence  and  a  Hilbert  sequence.  Then  there  are  constants  A,  B  ,>  0  such  that 

Vx  e  H,  Aj|X|l^  S;  i{x,y(n|ib’ t;  Bi|x||".  (2.13) 

Conversely,  if  (2.131  holds,  then  [xln)]  is  both  a  Bessel  sequence  and  a  Hilbert 
sequence. 

Proof. 

1)  The  second  inequality  of  (2.13)  is  clear  since  |x(n)l  is  a  Bessel  sequence, 
e.g.,  (2.12). 

To  verify  the  first  inequality  of  (2.13),  first  note  that  the  Bessel 
map  B  is  injective  by  Lemma  2.15.  Since  {x(n)l  is  a  Hilbert  sequence, 
the  Bessel  map  B  is  surjective,  so  that,  by  the  open  mapping  theorem, 
B-'  :  1^(2.)  -)  H,  is  continuous.  This  yields  the  first  inequality. 

2)  For  the  converse,  the  second  inequality  of  (2.13)  implies  |X(n)l  is  a 
Bessel  sequence. 
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Since  {x(n)}  is  a  Bessel  sequence,  the  Bessel  map  B  has  a  well- 
defined  continuous  adjoint,  B* :  1^(2.)  ->  H.  Clearly, 

(y,B*c)  =  (By,c)  =  ^(y.y(n))c(n) 

Thus,  for  any  c  e  1^(2.),  B’c  =  Ilc(n)y(n)  =  x  €  H;  and,  by 
the  biorthonormality,  c(n)  =  (x,y(n)).  Therefore,  {x(n.)I  is  also  a 
Hilbert  sequence. 

■ 

Condition  (2.13)  in  Theorem  2.16  defines  [y  (ii)}  as  a  frame,  ct.  Section  4. 

Theorem  2.17  ([47]).  Given  a  stationary,  minimal,  complete  sequence 
{x(n.))  C  H,  with  power  spectrum  y  =  fac- 

1)  !x(n)}  is  a  Bessel  sequence  if  and  only  if  f“J  €  L^(T). 

2)  (x(n)}isa  Hilbert  sequence  if  and  only  if  f„t  €  L°°(T). 

Rozanov's  Theorem  2.17  and  Masani's  related  contributions  utilize 
Theorem  2.5.  This  result,  and  similar  ones  by  these  authors,  were  proved  in 
the  multivariate  case,  Z  -9  H‘*,  e.g.,  [53,  Volume  ill].  There  are  .significant 
problems  in  the  multivariable  case,  2*'  H,  cf.  Section  4.2. 

Remark  2.18.  As  we  have  mentioned,  prediction  theory  leads  to  our  for¬ 
mulation  of  Section  3.  At  the  end  of  Action  3  we  shall  discuss  the  role 
of  the  uncertainty  principle  inequalities  in  the  context  of  weighted  Fourier 
transform  norm  inequalities.  With  this  in  mind,  we  close  this  section  with 
an  intriguing  observation  by  Norbert  Wiener  [54,  p.  9). 

"The  prediction  of  the  future  of  a  message  is  done  by  some  sort  of 
operator  on  its  past,  whether  this  operator  is  realized  by  a  scheme 
of  mathematical  computation,  or  by  a  mechanical  or  electrical  ap¬ 
paratus.  In  this  connection,  we  found  that  the  ideal  prediction 
mechanisms  which  we  had  at  first  contemplated  were  beset  by 
two  types  of  error,  of  a  roughly  antagonistic  nature.  While  the 
prediction  apparatus  which  we  at  first  designed  could  be  made  to 
anticipate  an  extremely  smooth  curve  to  any  desired  degree  of  ap¬ 
proximation,  this  refinement  of  behavior  was  always  attained  at  the 
cost  of  an  increasing  sensitivity.  The  better  the  apparatus  was  for 
smooth  waves,  the  more  it  would  beset  into  oscillation  by  small  de¬ 
partures  from  smoothness,  and  the  longer  it  would  be  before  such 
oscillation  would  die  out.  Thus  the  good  prediction  of  a  smooth 
wave  seems  to  require  a  more  delicate  and  sensitive  apparatus  than 
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the  best  possible  prediction  of  a  rough  curve,  and  the  choice  of  the 
particular  apparatus  to  be  used  in  a  specific  case  was  dependent 
on  the  statistical  nature  of  the  phenomenon  to  be  predicted.  This 
interacting  pair  of  types  of  error  seemed  to  have  something  in  com¬ 
mon  with  the  contrasting  problems  of  the  measure  of  position  and 
of  momentum  to  be  found  in  the  Heisenberg  quantum  mechanics, 
as  described  according  to  his  Principle  of  Uncertainty." 


3.  Weighted  Fourier  transform  norm  inequalities 

3.1.  Prediction  theory  and  weighted  Hilbert  transform  norm  inequalities 

Kolmogorov's  conception  and  characterization  of  deterministic  sequences 
lead  to  a  new  prediction  problem  formulated  by  Helson  and  Szego  (1960) 
[33].  We  shall  describe  this  problem,  its  relation  to  the  material  in  Section  2, 
and  the  role  of  the  Hilbert  transform.  In  light  of  Theorem  2.5,  we  shall  deal 
with  trigonometric  approximation  in  Lj;('I). 

Notation  3.1.  Given  q  =  f^n  -I-  n,  €  M ,  (1),  we  define  the  following  sub¬ 
spaces  of  (T): 

?o  =  spfe^"^^^' ;  k  $  0). 

T  =  sp[e^'’‘'^^  ;  k  ^  -I], 
and 

J  =  sple''"*'^^  :  k  ^  i; 

"7"  is  for  "past'  and  "  J"  is  for  "future." 

The  results  about  deterministic  and  minimal  sequences  from  Theorems 
2.8, 2.9,  and  2.12  in  Sertion  2  are  the  consequences  of  the  following  formulas 
developed  by  Szego  and  Kolmogorov,  respectively. 

Formulas  3.2. 

pe^l  I’ +T(Y)l^dn(y)  =expQ  logfatlYjdyj  ;  (3-1) 

inf  [  II +p(Y) +q(Y)l'dp(Y)  =  ([  f„,(y)-’dY)  .  (3.2) 

Remark  3.3.  If  log  foe  i  L’  (1)  then  the  right  side  of  (3.1 )  vanishes  and  so 
1  e  T;  in  fact,  7  =  Lj(1).  Similarly,  if  i  L'(T)  then  the  right  side  of  (3.2) 
vanishes  and  so  I  €  (IP  U  7). 
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In  the  converse  direction,  these  formulas  show  that  "if  fac  is  not  too 
small,"  i.e.,  if  the  right-sides  of  (3.1)  and  (3.2)  are  positive,  "then  the  expo¬ 
nentials  posse.ss  a  certain  kind  of  independence"  [33,  p.  108].  Intuitively,  this 
means,  for  example,  that  7  ^  J  in  the  case  that  the  elements  of  3’  are  lin¬ 
early  "independent"  of  J,  i.e.,  the  nondeterministic  case.  Geometrically,  this 
signifies  that  7  and  7  are  at  a  positive  angle  a  to  each  other  in  the  sense  that 

p  =  cos  a 

=  sup{|(p,  q)| :  p  e  T,  q  6  3"  and  i|p||2,^.  ||q||2,M  ^  1 1  <  1 . 

The  definition  (3.3)  is  the  natural  Hilbert  space  generalization  of  angle  from 
the  Euclidean  case,  where  the  law  of  cosines  is  used  to  evaluate  an  angle  a 
between  two  lines  (subspaces)  through  the  origin.  Clearly,  in  this  case,  if 
a  =  0  then  the  two  lines  are  the  same.  It  is  in  this  spirit  that  we  would  have 
the  deterministic  result,  7  =  7,  when  a  =  0  in  (3.3). 

Helson  and  Szegd  noted  that  the  notion  of  independence  defined  by  the 
condition,  p  <  ),  is  stronger  that  the  independence  defined  by  Kolmogorov's 
nondeterminism  (33,  p.  109],  and  discovered  the  following  remarkable  role 
for  the  Hilbert  transform  in  prediction  theory  when  dealing  with  positive 
angles  between  subspaces. 

Theorem  3.4.  [33,  Theorem  2,  pp.  129-130]  Given  p  =  foe  €  L’  (T),  and 
define  the  con;ugatefunction  p(y)  =  pin]  (sgn  n)  for  every 

trigonometric  polynomial  I1|„|<n  plnje^"'"'''.  There  is  p  €  (0,1)  such  that 
Vp  6  Tand  q  €  T. 

Re  p(Y)e^""'q(Y)foe(Y)dyj  $  p||p||2,Ml|q||2.,. 

Ji  I 

if  and  only  if  there  is  C  >  0  such  that 

Ilpll2,^  $  CIIpIU.u  (3.4) 

for  all  trigonometric  polynomials  p.  Equivalently,  To  and  7  are  at  a  pos¬ 
itive  angle  in  L^^(X)  if  and  only  if  (3.4)  holds  for  every  real  trigonometric 
polynomial  p. 

Remark  3.5.  If  f  is  a  trigonometric  series,  Y.  with  conjugate  series 

f,  then 

oo 

f +  if  =  Qo 

1 

is  a  series  of  analytic  type.  If  f  e  L^(X)  then  f  e  L^(X)  and  f  -f  if  €  H^CX). 
The  famous  theorem  of  Marcel  Riesz  (1927)  asserts 


vfeL'’(X),  l|f||p  $  C||f||p. 


(3.5) 
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where  p  >  1,  cf.  (3.4).  Kolmogorov  (1925)  proved  the  L’  (T)-version  of  (3.5); 

3C  >  0  such  that  Vf  e  L'  (X)  and  VA  >  0, 

|{Y:|f(Y)l>A}|^  CA-’|lf|l,, 

i.e.,  f  is  of  weak  L’-type  if  f  e  L'(X).  Note  that  the  constant  C  in  (3.6)  is 
independent  of  f. 

Definition  3.6. 

1)  The  Hilbert  transform  of  f  6  L^(93)  is  the  conjugate  function  f  defined 
by  the  formula 

(f)^(Y)  =  -i(sgny)f^(Y). 

2)  Since  sgn  y  =  2H(y)  —  I  (H  =  Heaviside  function)  and  since  the  distri¬ 
butional  Fourier  transform  of  ^pv  (|)  is  2H(y)  -  1,  then 

f(t)  =  lpv(^l^  *f(t). 

Thus,  prediction  problems  are  intimately  related  to  weighted  Hilbert 
transform  norm  inequalities. 

3.2.  Ap-weights,  the  Hilbert  transform, 

and  the  fundamental  theorem  of  calculus 

Definition  3,7.  For  each  f  €  Li„c’(91),  the  Hardy-Littlezoood  (1930)  maximal 
function  Mf  is  defined  as 

Vt€fR,  (Mf)(t)  =sup-}-  [  |f(u)|du. 

tel  hi Ji 

where  I  ranges  over  the  nontrivial  compact  intervals  containing  t.  The 
extension  to  91''  was  made  and  used  by  Wiener  (1939)  in  work  on  the  er- 
godic  theorem. 
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Theorem  3.8. 

1)  If  f  €  L'  (iK)  then  Mf  is  of  weak  L’-type,  i.e.,  there  is  a  constant  C  >  0 
such  that 

Vf  eL'(9t)and  VA  >  0 

(3.7) 

l{t:(Mf){t)>X)KCX-'llf||i. 

Note  that  the  constant  C  in  (3.7)  is  independent  of  f. 

2)  If  1  <  p  ^  oo  and  f  €  L'’(9I)  then  Mf  €  and  there  is  a  constant 

C  =  Cp  >  0  such  that 

VfeLP(93),  lIMfllp  ^  Cp||f||p 

(Hardy-Littlewood,  1930). 

3)  Iff  6  L’ (53)  then 

lim  f  f(u)du  =  f(t)a.e. 

HI  Ji 

where  for  a  given  t,  the  measures  of  the  compact  intervals  I  tend  to  0 
(Lebesgue,  1910). 


Remark  3.9.  In  Theorem  3.8,  Item  3  is  a  corollary  of  Item  1.  Item  3  is  that 
direction  of  the  fundamental  theorem  of  the  calculus  which  asserts  that 


"D  o  J  =  Idendty,"  (3.8) 

where  "D"  is  the  differentiation  operator  and  "J"  is  the  integration  operator; 
and  so  Item  1  ofTheorem3.8can  be  viewed  as  a  quantitative  version  of  (3.8). 
Of  course,  we  also  know  that 


'  joD  =  Identity.' 


More  precisely,  for  compact  intervals,  F  is  absolutely  continuous  on  [a,  b)  if 
and  only  if 

3f  €  L' (q,  b]  such  that  Vt  €  1q.  bl,  F(t)— F(Q)=f  f(u)du. 

J  a 


Definition  3.10.  Given  1  <  p  <  oo  and  a  Borel  measurable  function  v  >  Oa.e. 
V  is  an  Ap-weight  A^-weight,  written  v  €  Ap,  if  there  is  a  constant  C  >  0 
such  that 

VI  (compact  interval). 


T 
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For  example,  v(t)  =  |t|“  €  A,,  if  -1  <  a  <  p  -  1. 

The  Ap  condition  is  precisely  what  is  needed  to  prove  a  weighted 
version  of  Item  2  of  Theorem  3.8. 

Theorem  3.11  (Muckenhoupt,  1972).  Given  1  <  p  <  oo  and  a  Borel 
measurable  function  v  >  0  a.e.  There  is  a  constant  C  >  0  such  that 

Vf€LS(91),  ||Mf||p.v  $  C||f||p.v 

if  and  only  if 

V  e  Ap. 

The  relation  between  the  material  of  this  subsection  and  Section  3.1  is 
made  by  the  following  result. 

Theorem  3.12  (Hunt,  Muckenhoupt,  Wheeden,  1973).  Given  1  <  p  <  c» 
and  a  Borel  measurable  function  v  >  0  a.e.  There  is  a  constant  C  >  Osuch  that 

VfeLrdB),  llf'llp.v  ^  C||f||p,v 

if  and  only  if 

V  €  Ap. 

Besides  the  original  papers,  (26)  provides  an  excellent  treatment  of  The¬ 
orems  3.1  1  and  3.12,  as  well  as  subsequent  related  developments  concerning 
Ap  and  maximal  and  singular  integral  operators. 


3.3.  Ap-weights  and  weighted  Fourier  transform  norm  inequalities 

We  have  seen  how  prediction  theory  leads  to  weighted  Hilbert  transform 
norm  inequalities  and  Ap  weights,  with  an  accompanying  theme  dealing 
with  the  fundamental  theorem  of  calculus. 

The  following  result  illustrates  how  Fourier  transform  inequalities 
come  into  the  picture. 

Theorem  3.13.  (10)  Given  1<p$q$p'<oo  and  an  even,  non-negative, 
Borel  measurable  function  v,  which  is  nondecreasing  on  (0,oo).  There  is  a 
constant  C  >  0  such  that 

Vf  e  cr(9i), 

a  “  ’ 


1 


(3.10) 
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if  and  only  if 

€  A,, 

cf.  [5,  Section  2.2.2]  for  a  d-dimensional  version. 

Remark  3.14. 

1)  Depending  on  v,  the  quantitative  expression  for  f  is  not  transparent  in 
the  case  f  ^  LS (iR)\L'  (91)  n  LS (91),  e.g.,  [5L  [8],  and  recent  work  with 
my  student,  J.  Lakey  [41]. 

2)  If  p  =  q  =  2  then  (3.10)  becomes 


llf’lb.v,!,  $C((f((,,„,t,.  (3.11) 

If  we  define  the  Kelvin  operator, 

then  (3.11)  becomes 

IlKflli.v  ^  C||fi|2,v. 

There  is  a  corresponding  inequality  in  terms  of  K  for  the  general  case 
(3.10).  Harmonicity  in  91^  is  invariant  under  any  conformal  mapping. 
This  fact  is  not  valid  in  91*^,  d  >  2,  and  Kelvin  transformations  are  used 
to  provide  the  invariance  of  harmonicity  in  these  higher  dimensions  as 
well  as  91^  (W.  Thomson,  Lord  Kelvin,  1847). 


Example  3.15. 


1)  If  p  =  q  and  v  =  1  then  (3.10)  is  the  Hardy,  Littlewood,  Paley  theorem 
(1931), 

y  |f(Y)l'’lYr-'dyj  ^C||f||,,.  (3.12) 


originally  proved  for  Fourier  series. 

2)  If  q  =  p'  and  v  =  1  then  (3.10)  is  the  Hausdorff-Young  (1923),  Titch- 
marsh  theorem  (1924), 


11%'  $  C||f||p.  (3.13) 

Hausdorff-Young  proved  (3.13)  for  Fourier  series,  and  Titchmarsh 
proved  it  for  Fourier  transforms. 
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3)  If  v(t)  =  |tr,  0  ^  a  <  p  -  1,  then  (3.10)  is  Pitt's  theorem  (1937), 

Q|f(Y)l"lYr‘*dY)  cQ|f(t)|P|trdt)  (3.14) 

where  3  =  ^(a+l)  +  1-  q.  The  result  was  originally  proved  for 
Fourier  series.  If  p  =  q  and  a  =  0  then  (3.14)  reduces  to  (3.12).  If 
q  =  p'  and  a  =  0  then  3=0  and  (3.14)  reduces  to  (3.13). 

4)  The  result  in  Item  1  was  first  proved  by  Hardy  and  Littlewood,  but 
in  the  same  year  Paley  proved  it  for  uniformly  bounded  orthonormal 
systems.  Paley's  ideas  are  significant  and  deal  with  rearrangements, 
cf.  J.E.  Littlewood,  "On  a  theorem  of  Paley,"  JLMS,  29(1954),  387-395. 
Salem  and  Zygmund  proved  (3.12)  for  p  =  1  when  the  given  Fourier 
series  is  of  analytic  type  (BAMS,  55(1949),  851-859). 


3.4.  Weighted  Fourier  transform  norm  inequalities 
and  the  uncertainty  principle 

Our  path  from  prediction  theory  lead  us  in  Section  3.3  to  A,,  weights  which 
characterize  special  weighted  Fourier  transform  norm  inequalities.  The  next 
step  is  to  see  what  is  involved  in  establishing  general  weighted  Fourier 
transform  norm  inequalities. 

H.  Heinig  and  I  proved  the  following  result  during  the  summer  of 
1982  here  in  Toscana  (as  well  as  in  North  America).  Similar  results  were 
being  proved  during  the  same  period  by  Muckenhoupt  and  by  Jurkat 
and  Sampson. 


Theorem  3.16  ([9]).  Given  1  ^  p  $  q  <  oo  and  two  even,  non-negative, 
Borel  measurable  functions  u  and  v,  which  are  nonincreasing  and  nonde¬ 
creasing,  respectively,  on  (0,oo).  There  is  a  constant  C  =  C(K)  >  Osuch  that 


vf  e  c~(9i) 

if  and  only  if 


C||f||p.v 


sup 

s  >0 


v(t)-'’''''’dt') 
0  / 


1  ,'p' 


K  <  oo. 


(3.15) 


(3.16) 


Remark  3.17. 

1)  Naturally,  from  general  considerations,  (3.15)  allows  us  to  define  f  for 
each  f  €  Lv  (91),  with  the  same  caveat  to  which  we  alluded  in  Item  1  of 
Remark  3.14. 
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2)  Our  proof  of  the  necessary  conditions  for  (3.15),  viz.,  the  implication 
that  (3.15)  implies  (3.16),  does  not  require  monotonicity. 

3)  Given  our  present  point  of  view  of  tracing  mathematical  paths  from 
Kolmogorov's  seminal  work,  we  should  point  out  that  Theorem  3.16 
was  used  in  our  proof  of  Theorem  3.13. 

Qualitatively,  the  condition  (3.16)  is  in  the  spirit  of  the  uncer¬ 
tainty  principle.  In  the  early  1980s,  Heinig  and  1  verified  an  elementary 
weighted  Heisenberg  inequality  by  means  of  Theorem  3.16.  In  a  re¬ 
cent  NATO  ASl,  we  developed  a  full  theory  of  uncertainty  principle 
inequalities,  taking  into  account  significant  work  of  others,  working 
in  the  context  of  weighted  Fourier  transform  norm  inequalities,  and 
utilizing  wavelets  and  coherent  states  [5]. 

Theorem  3.16  is  just  the  starting  point  for  weighted  Fourier  trans¬ 
form  norm  inequalities.  The  theory  has  been  highly  developed  in  the 
past  decade  by  many  harmonic  analysts,  and  is  naturally  akin  to  the 
topic  of  restriction  theorems  where  geometry  plays  such  a  critical  role 
The  goal  is  to  characterize  norm  inequalities  such  as  (3.15)  both  effec¬ 
tively  and  computationally,  and  for  the  most  general  class  of  weights. 
Our  most  recent  contribution  [8]  deals  with  effective  criteria,  i.e.,  no 
rearrangements,  and  measure  weights;  it  also  contains  references  to 
recent  contributions  by  others,  cf.  [41]. 


4.  Stationary  frames 


4.1.  The  theory  of  frames 


Definition  4.1. 

1)  A  sequence  [x(n)l  in  Hilbert  space  H  is  a  frame  if  there  exist  A,B  >  0 
such  that 

VyeH,  A||y||^  $  ^|(y,x(n))|' $  B||y||^ 

where  ( , )  is  the  inner  product  on  H  and  the  norm  of  y  €  H  is  i|y||  = 
(y.y)'^^-  A  and  B  are  the  frame  bounds,  and  a  frame  {x(n.)l  is  tight  if 
A  =  B.  A  frame  {x(n)l  is  exact  if  it  is  no  longer  a  frame  when  any  one 
of  its  elements  is  removed.  Clearly,  if  {x(n)}  is  an  orthonormal  basis  of 
H  then  it  is  a  tight  exact  frame  with  A  =  B  =  1. 

2)  The  frame  operator  of  the  frame  {x(n)l  is  the  function  S  :  H  ->  H 
defined  as  Sy  =  5I(y,x(n))x(n). 
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The  theory  of  frames  is  due  to  Duffin  and  Schaeffer  [19]  in  1952.  Expo¬ 
sitions  include  the  book  by  Young  [56]  and  the  article  by  my  students  C.  Heil 
and  D.  Walnut  [29];  the  former  is  presented  in  the  context  of  nonharmonic 
Fourier  series  and  the  latter  in  the  setting  of  wavelet  theory. 

Theorem  4.2.  Let  {x(n.)}  C  H  be  a  frame  with  frame  bounds  A  and  B. 

1)  S  is  a  topological  isomorphism  with  inverse  S“'  :  H  — >  H.  ((S^’x)(n)} 
is  a  frame  with  frame  bounds  B“'  and  A“',  and 

Vy  €  H,  y  =  ^  (y,(S“'x)(n))x(n) 

=  ^(y,x(n))(S~'x)(n). 

The  first  expansion  is  the  frame  expansion  and  the  second  is  the  dual 
frame  expansion. 

2)  If  {x(n)}  is  tight,  ||x(n)[|  =  I  for  all  n,  and  A  =  B  =  1,  then  (x(n.)]  is  an 
orthonormal  basis  of  H. 

3)  If  (x(n)]  is  exact,  then  [x(n))and  [(S“'x)(n)]  are  biorthonormal,  i.e., 

Vm.n,  (x(Tn),  (S“'x)(n))  =  6,nn- 


Remark  4.3.  We  comment  on  Item  2  because  it  is  surprisingly  useful  and 
because  of  a  stronger  result  by  Vital!  (1921)  [51]. 

To  prove  Item  2  we  first  use  tightness  and  A  -  1  to  write, 

||x(m)ji-^  =  l|x(Tn)|!‘'  f  Y_  |(x(m),x(n))|^; 

n  /  »n 


and  obtain  that  ;x(n)!  is  orthonormal  since  each  tlx(n)ll  =  1 .  To  conclude  the 
proof  we  then  invoke  the  well-known  result:  if  (x(n)*  c  H  is  orthonormal 
then  it  is  an  orthonormal  basis  of  H  if  and  only  if 

VyeH,  ||y||^  =  21  • 

In  1921,  Vitali  proved  that  an  orthonormal  sequence  [gnl  t  L^[a,  b]  is 
complete,  and  so  [gnl  is  an  orthonormal  basis,  if  and  only  if 


Vt  €  [a,b|, 


gn(u)du 


=  t  -  a. 


(4.1) 


For  the  case  H  =  [a,  bl,  Vitali's  result  is  stronger  than  Item  2  since  (4.1 )  is 

tightness  with  A  =  1  for  functions  f  =l|a.i|- 
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Definition  4.4.  Let  H  be  a  complex  separable  Hilbert  space.  A  sequence 
{x(n)}  C  H  is  a  Schauder  basis  or  basis  of  H  is  each  y  e  H  has  a  unique 
decomposition  y  =  ^  Cn(y  )x(n).  A  basis  {x(n)l  is  an  unconditional  basis  if 
3C  such  that  VF  C  Z,  where  card  F  <  oo,  and 
Vbn.Cn  G  e,  where  n  e  Fand  |bnl  $  lc„l. 


y  bnx(n) 


^  c 


y_  CnX{n,) 
n€F 


An  unconditional  basis  (x(n)}  is  bounded  if 


3A,  B  >  0  such  that  Vn,  A  $  ||x(n)||  $  B. 

Separable  Hilbert  spaces  have  orthonormal  bases,  and  orthonormal 
bases  are  bounded  unconditional  bases. 

Kothe  (1936)  proved  the  implication.  Item  2  implies  Item  3,  of  the  fol¬ 
lowing  theorem.  The  implication.  Item  3  implies  Item  2,  is  straightforward; 
and  the  equivalence  of  Item  1  and  Item  3  is  found  in  [56,  pp.  188-189]. 


Theorem  4,5.  Let  H  be  a  complex  separable  Hilbert  space  and  let  [x(n)l  c  H 
be  a  given  sequence.  The  following  are  equivalent: 

1)  {x(n)}  is  an  exact  frame  for  FI; 

2)  {x(n)}  is  a  bounded  unconditional  basis  of  H; 

3)  [x(n)l  is  a  Riesz  basis,  i.e.,  there  is  an  orthonormal  basis  lu(n)]  and 
a  topological  isomorphism  F  :  H  H  such  that  (Tx)(nl  =  u(nl  for 
each  n. 


Theorem  4.6.  Let  FI  be  a  separable  Hilbert  space  and  let  Ix(n);  C  Ft  be  a 
frame.  {x(n)l  is  an  exact  frame  <==>  lx(n)J  is  a  minimal  sequence. 

Proof. 

Since  {x(n)}  is  exact  we  have  that  {x(n)l,  |(S"  'x)(n)}  are  biorthonormal 
[19],  cf.  [29,  p.  637].  By  Item  2  of  Definition  2.13b,  {x(n)|  is  minimal. 

<=  Since  {x(n)!  is  minimal  then 

Vp,  x(p)  ^  ^{x(n)  :  n  ^  pj  (4.2) 

To  prove  [xin),  is  an  exact  frame  we  must  show  that  each  (x(n)  :  n  /  p] 
is  not  a  frame.  If  any  {x(n)  :  n  /  p]  were  a  frame,  then  by  Theorem  4.2 
*(p)  =  Hn/p  Cnx(n),  and  this  fails  by  (4.2).  | 

Corollary  4.7.  Given  a  stationary,  minimal,  complete  sequence  x  :  2.  — >  H. 
Then  (x(n)l  is  both  a  Bessel  sequence  and  a  Hilbert  sequence  if  and  only  if 
(x(n))  is  a  bounded  unconditional  basis. 
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Proof.  Bessel-Hilbert  sequences  are  frames  by  Theorem  2.16.  Thus,  {x(n)l 
is  a  bounded  unconditional  basis  by  Theorems  4.5  and  4.6.  The  converse  is 
immediate  by  Theorems  2.16  and  4.5.  | 

Example  4.8.  Given  g  G  and  a  =  (qi  , . . . ,  oa),  b  =  (bi , . . . ,  ba )  G 

SR''.  Assume  each  Uj,  bk  >  0.  Define  the  translation  and  modulation  maps, 

Tnof(t)  =  f(t  -  na)  and  E,nbf(t)  = 

respectively,  where  m.n  G  2.'',  f  G  L^(SR‘'),  no  =  (ni  qi  , . . . ,  naUd)/  and 
mb  =  (mi  bi , . . . ,  mabd ).  The  Weyl-Heisenberg  system  [chm.n  :  m,  n.  G  2‘*  1 
is  defmed  by 

4^m,n  ” 

cf.  [11,  Definition  2.6]  for  a  generalization.  If  [(]),„.„}  is  a  frame  for  L^(SR‘' )  it 
is  called  a  Weyl-Heisenberg  or  Gabor  frame  (of  coherent  states). 

Remark  4.9.  Given  g  €  L'^(iR‘' )  and  a, b  >  0  for  which  ab  =  1. 
is  a  frame  then  it  is  an  exact  frame,  cf.  Theorem  4.5.  This  remarkable  fact  (for 
ab  =  1)  can  be  proved  using  properties  of  the  Zak  transform,  which  we 
now  define. 

Definition/Property  4.10. 

1)  Given  a  =  (ai , . . . ,  oa )  €  SR*',  with  each  a,  >  0.  The  Zak  transform  of 
(  €  Luu  '  ()R‘M  is  formally  defined  as 

Gf(x,vv)  =  a'’  -  Y_  flxQ+kalc^”'''  ''’.  (x,u-)  ^  T‘'  (4.3) 

where  multiplication  is  component-wise  and  a‘'  --  Fla.  Its  his¬ 

tory  has  been  traced  to  Gauss,  from  whence  the  "G"  in  (4.3);  and  in 
recent  times  there  have  been  independent  formulations  by  Auslander- 
Tolimieri,  Brezin-Weil,  and,  of  course,  Zak,  cf.  [36]. 

2)  Formally,  Gf  is  quasi  periodic  in  the  sense  that 

Gf(x  f  n,w)  =  *''G(x,w)  and  G(x,w  +  n)  =  G(x,w) 

for  (x,w)  G  'I''  X  1''  and  n  G  2'*. 

The  proof  of  Theorem  4.11  is  straightforward,  beginning  with  an  ele¬ 
mentary  calculation  verifying  that 

Vf  G  C^((R‘'), 


XT'!)  —  IKIU- 
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Theorem  4.11.  G  :  ^  x  I'* )  is  a  unitary  map. 

Given  Q,b  e  in'*  with  each  aj,bk  >0.  If  g  €  L^(1R‘')  and  ab  =  1,  i.e, 
a  j  b  j  =  ]  for  each  j  =  1 , . . . ,  d,  then 


G(E,nbTnag)(x,w)  =  E,n  {x)En  {w)Gg(x,  w) 


=  E,n,n(x,w)Gg(x,H'),  (x,w)  €  T'*  X  I'*. 


(4.4', 


Note  that  {E,n,nl  is  an  orthonormal  basis  of  x  T*^).  By  isolating  Gg 

(from  G(E,nbTnag))  when  considering  {<t>m,n],  it  is  dear  that  (4.4),  in  con¬ 
junction  with  Theorem  4.11  plays  a  role  in  the  following  result.  This  result 
(Theorem  4.12)  has  had  partial  formulations  in  the  coherent  states  literature 
for  many  years,  and  seems  to  go  back  to  the  analysis  of  von  Neumann  found 


1,  pp.  405ff.],  cf.  [18],  [28,  Proposition 


7.3.4],  [29,  Theorem  1.3.3],  [35]. 


Theorem  4.12.  Given  g  €  L^(91‘')  and  a,b  t  91**  with  each  Oi.bi^  >  0. 
Assume  ab  =  1  and  consider  the  Weyl-Heisenberg  system  (defined 

in  Example  4.8). 

V  is  complete  in  L-^lOl** )  if  and  only  it  Gg  r-  Oa.e. 

2)  {({im.n  1  isminimal  and  complete  in  L^(91‘*)if  and  only  if  1/Gg  t  L‘('r‘*  ^ 
T*' ). 

3)  cbm.ti]  is  an  orthonormal  basis  of  1^(91** )  if  and  only  if  iGgi  -  la.e. 

4)  I  is  a  frame  for  L'lgi** )  with  frame  bounds  A  and  B  if  and  only  if 

A  <;  IGgl^  $  B  a.e. 

In  this  case,  [dTn.nl  is  an  exact  frame. 

Item  1  of  Theorem  4.12  should  be  compared  with  Corollary  4.7  and 
Section  4.3,  where  we  note  that  (cfm.n '  is  stationary  in  the  case  ab  1. 


Example  4.13.  Given  vl’  *:  t'^(91|.  The  wavelet  system  :  m.n  ^  2.1  is 

defined  as 

U),n,n(t)  =2"'  ^vK2"‘t-n|. 

If  (ch.n.Ti  1  is  a  frame  for  L^(91)  it  is  called  a  wave/et  frairr 


4.2.  Multidimensional  analogues  of  classical  analysis  problems 

The  extension  of  the  Kolmogorov  or  Szego  or  Wiener  prediction  theory  to 
multidimensional  domains  is  a  natural  problem,  and  has  been  and  is  being 
pursued,  e.g.,  [15,  .37].  Chiang's  work  [15]  precedes  the  well-known  contri¬ 
bution  of  Helson  and  Lowdenslager.  There  has  been  a  proliferation  of  results 
dealing  with  specific  topics  and  diver.se  levels  of  abstraction.  (For  example. 
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[20,  Preface]  provides  references  for  Markovian  properties  in  this  setting.) 
From  the  mathematical  point  of  view,  the  work  of  Helson  and  Lowdenslager 
[31 , 32, 30]  is  preeminent.  One  can  reasonably  argue  its  central  position  in  the 
development  of  abstract  analysis  for  a  generation,  and  its  influence  on  such 
topics  as  locally  comipact  abelian  groups,  Dirichlet  algebras,  von  Neumann 
algebras,  and  H'’-theory,  e.g.,  [53,  Volume  III,  pp.347ff.]  for  what  now  are 
classical  (or  at  least  standard)  references.  With  all  of  this,  there  are  still  basic 
multidimensional  prediction  problems. 

Our  goal  in  this  section  is  to  exhibit  a  little  bit  of  the  multidimensional 
evolution  of  one  particular  classical  problem,  which  played  a  role  in  [39].  At 
the  very  least,  it  gives  us  the  opportunity  to  advertise  two  recent  and  deep 
contributions  by  Benedicks  [12]  and  Gabardo  [23].  Our  slightly  broader 
goal  in  Section  4  is  to  suggest  an  interleaving  of  technology  between  the 
theories  of  frames  and  prediction,  with  the  hope  of  bringing  new  techniques 
to  bear  on  the  problems  in  each  area.  Our  method  will  become  apparent  in 
Section  4.3. 

The  role  of  the  F.  and  M.  Riesz  theorem  in  one  of  Kolmogorov's  spectral 
characterizations,  viz.,  [39,  Theorem  23],  was  discussed  in  Remark  2.10. 

Theorem  4.14  (F.  and  M.  Riesz,  1916).  Given  m  p  M('X)  and  assume 
u(n)  -  0  for  all  n  0.  Then  u  -  f<u,  i.e.,  u  t  L'  (X). 

Three  years  after  Kolmogorov's  paper  [39],  Bochner  published  the  fol¬ 
lowing  result. 

Theorem  4.15  (Bochner,  1944).  Given  m  t  Mix'll  and  assume  n  vanishes 
outside  a  sector  S  C  of  opening  as  <  n.  Then  u  €  L’  ( X^  |.  (Precisely,  S  is 
a  closed  sector  of  '.11"  .) 

Bochner's  work  was  not  only  an  inspiration  for  Helson  and  Low- 
denslager's  program,  in  which  they  generalized  Szegb's  theorem  dramati¬ 
cally,  but  in  [31 1  they  proved  a  generalization  of  the  F.  and  M.  Riesz  Theorem 
of  which  Theorem  4.15  is  a  corollary,  e.g.,  [48,  Chapter  8],  a  book  where  many 
of  us  began.  Instead  of  the  duality  between  X  and  T,  the  setting  in  [31  ]  is  the 
duality  between  a  compact  connected  abelian  group  F  and  its  discrete  dual 
group  G  in  the  case  G  is  ordered,  e.g.,  [48,  pp.  I93--194]  for  the  definition  of 
ordered  group. 

Ordered  groups  also  arise  in  the  theory  of  Cohr  almost  periodic  func¬ 
tions,  e  g.,  [53,  Volume  III,  pp.  347-348]. 
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4.3.  Stationary  frames 
Definition/Remark  4.16. 

1)  A  sequence  {x(n)  :  n  €  Z**)  in  a  complex  Hilbert  space  H  is  stationary 
if  the  inner  product, 

R(m)  =  Rxx(Tn)  =  {x(m  +  k),x(k)),  meZ**, 
is  independent  of  k  €  Z'^,  i.e., 

Vm.n  €  Z'*,  (x(m),x(n))  =  (x(m  -  n),x(0)). 

Thus,  {x(n)}  is  stationary  if  the  function  s  :  Z'*  x  Z**  C,  defined  as 
s(m,,n)  =  (x(m),x(n)), 

has  the  form  s(m,n)  =  s(m  —  n).  In  this  case,  Rxx(m.  —  n)  =  s(m  - 
n),  and  Rxxl^n)  =  slm  -  0)  =  s{m,0)  =  (x(m),x(0)).  Rxx  is  the 
autocorrelation  of  x  and  is  a  positive  definite  function  on  the  group  Z"*. 
q  =  RC'x€M,CT‘‘1  is  the  power  spectrum  of  x. 

2)  The  analogue  of  Theorem  2.5  is  valid  for  stationary  sequences  x  ;  Z'*  -> 
H,  where  H(x)  =  sp{x(n)  :  n  €  the  mapping 

Zaf.CI'*)  -sH(x), 

defined  linearly  on  sp{e^”"'  '''I  by  Z(c^""'  '>')  =  x(n.),  extends  to  a  linear 
isometric  isomorphism. 

3)  A  stationary  sequence  {xln)  ;  n  e  Z.^J  t  H  is  a  stationary  frame  if 
!x(n)  ;nP  Z**)  is  a  frame  in  H. 


Example  4.17. 

1)  Given  g  e  and  Q,b  €  with  each  Oj.bv.  >  0.  Consider  the 

Weyl-Heisenberg  system  (({),„, n  :  m,  n  t  Z**!,  where 

4*  m  .  n  ~  t  m  b  T  n  <1 9  • 

If  ab  =  1,  i.e.,  Ojbj  =  1  for  each  j  =  l,...,d,  then  the  sequence  x  : 
Z‘'  '  Z**  defined  by  x{m.,n]  =  (pm. n,  is  a  stationary  sequence. 

In  fact. 


(4^m  ,n  1  4^p.q  ) 


^2nit  (?n-p)b 


g(t  -  na)g(t  -  qa)dt 


ginif  u  ♦  qu  )  f  m  -  p  t b 


g(u  -  (n  -  q)a)g(u)du 
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,2nit(m  — p)b 


exp  i  27ti^qj(Tnj  -Pj)ajbj 


i  I 


xg(u  -  (n  -  q)a)g(u)du 
=  I  -  (n  -  q)a)g(u)du 


^  {4^m  — p 


.n~q 


g)  =  (4)  m  — p.n  —  q  >  4>o.o), 


and  this  is  the  desired  stafionarity.  Of  course,  the  positive  definiteness 
follows  from  general  considerations,  viz., 

y~  Cni.nCp.qR^im.n)  -  (p,q)^ 

»n  ,  n  ) 

V  a  I 

~  ^  Cm,nCp,<|  (cbm.n  .  ) 

m .  11 1 
(■ .  11  ’ 

~  ^  (^m,n  4^111  ,n  1  ep^<(4)p  q  i 


r  y  1 


Thus,  R'^  =  u  M  ,  (T‘‘ ). 
2)  In  the  case  of  Item  1, 


R(m,  n) 


"’NT,„.g(u))g(u)du. 


Clearly,  iR(m,n)|  $  ||g|l(,,,j,y,  for  (m.n)  €  2.'*  <  2.‘'. 
Also,  for  each  n  e  Z*',  (T„<.g)  g  e  L’(fH*'),  and  so 


Vn,  lim  iR(m,n)|=0 

I  in  1-4  oc 

by  the  Riemann-Lebesgue  lemma.  Here,  thinking  in  terms  of  locally 
compact  abelian  groups,  lim|,„,._^  r(m  |  =  0  indicates  that  for  all  e  >  0, 
there  is  a  finite  set  K  C  2,‘'  such  that 


Vm  e  Z‘'\K,  |r{m)|  <  e. 


Further,  by  Parseval's  theorem. 


R(Tn,n) 


p  [Y)dY; 
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and  so,  as  above, 

Vm,  lim  |R(m,n)|=0. 

|ti|— »oo 

Example  4.18.  Given  e  and  the  wavelet  system  {tjim.n}-  We  com¬ 

pute  the  following 

=2"^  ft|)(2'''t-n)^(2'’t-q)dt 

=  2"^  \i)  (2'"~'’(u  +  q)  -  n)  il)(ii)2~'’du 

=  2^  (2"'-'’u  -  (n  -  2"’-'’q))  f^[n)du. 

Thus,  {ilim.nj  is  not  a  stationary  sequence  since  n  -  2'''“'’q  n  -  q  unless 
m  =  p. 

Besides  Example  4.17,  there  is  another  relationship  between  Weyl- 
Heisenberg  systems  and  power  spectra  which  we  first  proved  in  the  AMS 
series.  Contemporary  Mathematics,  91(1989),  pp.9-27. 

Theorem  4.19.  Given  g  e  L'^'(91).  Define  the  (analogue)  Weyl-Heisenberg 
system, 

4>a.,x(t)=EaJxg(tl=e‘'’‘'^^'g|t-x). 

(x,  cu)  6  91  -  91,  and  the  L' -Weyl-Heisenberg  transform, 

Vf  6  L'(91],  W(f)(x,cu)  =  jf(t)4;,,.,,(t)dt. 

Assume  g  has  a  continuous  autocorrelation, 

1  f’ 

Vt  €  91,  R(t)  =  lim  —  g(t  +  u)g(u)du 

T-too  2T  J_i  ' 

with  R(0)  >  0;  and  let  [pnl  C  L'(iB)  have  the  property  that  {Pnl  c  L'(91)  is 
an  L '-approximate  identity.  Then 

VfeL'(91),  lim  llf-fnlh  =0, 

n—*oo 

where 
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Remark  4.20.  If  g  =  1  then  the  L’-Weyl-Heisenberg  transform  is  the  ordinary 
Fourier  transform;  and  Theorem  4.19  is  the  usual  L’ -inversion  theorem  for 
the  Fourier  transform. 

The  autocorrelation  defined  in  Theorem  4.19  corresponds  to  the  auto 
correlation  defined  for  stationary  sequences  in  the  case  of  correlation  ergodic 
processes.  Also,  it  is  possible  to  substitute  other  modes  of  convergence  in 
the  definition  of  autocorrelation  and  still  obtain  Theorem  4.19. 

Point  of  view.  Given  the  stationary  sequence 

where  x(m,n)  =  cfm.n  for  some  g  €  and  ab  =  1;  and  suppose 

H(x)  = 


where  H(x)  =  splx(m,n) ;  (nv.n.)  €  2;.‘'  x  Z**!.  if  g.  is  the  power  spectrum  of 
X  then 

Z:  1^(1“  X  T**)  -► 

is  a  unitary  map,  as  is  the  Zak  transform, 

and  the  induced  map, 

GoZ:L^.C.r‘’  x  ^  .jd, 

In  this  last  case, 

G  o  Z(E,n,„)(x,a>)  =  E,„,„(x,tu)Gg(x.cu), 

cf.  (4.4).  Note  that  (i  ^  0  is  a  periodic  measure  on  fR‘'  x  fR‘',  and  Gf  is 
quasi-periodic  on  fR‘*  x  iR‘*  for  each  f  €  L'^lfR'’). 

The  general  problem  we  pose  is  to  analyze  and  compare  the  periodic 
measure  n  and  quasi-periodic  function  Gg  vis  a  vis  obtaining  results  in  mul¬ 
tidimensional  prediction  theory  and  the  decomposition  theory  for  coherent 
states,  e.g..  Theorem  4.12. 

In  one  direction  it  is  natural  to  establish  the  role  of  n  in  formulating 
criteria  for  expansions  such  as  those  given  in  Theorem  4.12,  in  the  case 
the  Zak  transform  of  g  is  more  intractable  than  p.  In  the  other  direction,  we 
envisage  incorporating  the  Zak  transform  of  g  in  obtaining  "spectral"  charac¬ 
terizations  of  deterministic  properties  of  various  complete  Weyl-Heisenberg 
systems  indexed  by  g  €  L^ffR**).  There  are  partial  results,  and,  assuming 
further  progress  can  be  made,  these  will  appear  elsewhere. 
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5.  The  spherical  Wiener-Plancherel  formula 


5.1.  Wiener-Plancherel  formulas 

What  exactly  is  a  Wiener-Plancherel  formula?  Given  a  function  e])  defined 
on  91^  having  Fourier  transform  defined  on  W‘^(=  Suppose  the 

distribution  is  intractable,  as  is  likely  for  poorly  behaved  4’-  Let  s  be 
an  operable  integral  of  4)/  i  e.,  suppose  that  s  is  a  well-behaved  function 
and  that  Ls  =  distributionaliy,  for  some  differential  operator  L.  Wiener's 
idea  was  to  deal  with  a  computable  function  s  instead  of  the  more  esoteric 
distribution  4),  and  to  relate  the  quadratic  behavior  of  4>  and  s.  In  particular, 
for  the  spherical  case  dealing  with  balls  B(0,  R)  =  {t  e  |t|  ^  R]  having 
volumes  |B(0,  R)|,  a  Wiener-Plancherel  formula  has  the  form 

|4){t)j^dt  =  Q(s), 

R-too  |D(U,  KJI  Jb(0,R| 

where  Q(s)  is  an  explicit  quadratic  expression  and  Q,s  and  L  are  interde¬ 
pendent,  cf.  (5.1)  for  the  exact  formula.  In  Wiener's  original  result  (d  =  1), 
Ls  can  be  correctly  formulated  as  a  first  distributional  derivative  of  s,  and 


=  lim  ^  r 

A— tO  2A 


|s(Y  +  A)-s(Y-A)|^dY. 


The  Plancherel  formula  allows  one  to  define  the  Fourier  transform  of 
a  square-integral  function  f,  and,  at  certain  levels  of  abstraction,  it  is  con¬ 
sidered  as  characterizing  what  is  meant  by  an  harmonic  analysis  of  f.  On 
the  other  hand,  for  most  applications  in  91**,  the  Plancherel  formula  assumes 
the  workaday  role  of  an  effective  tool  used  to  obtain  quantitative  results. 
It  is  this  latter  role  we  envisage  for  Wiener-Plancherel  formulas  in  the  non- 
square-integrable  case.  After  all,  distribution  theory  (in  91**)  gives  the  proper 
definition  of  the  Fourier  transform  of  tempered  distributions.  The  real  issue 
is  to  obtain  quantitative  results  for  problems  where  an  harmonic  analysis  of 
a  non-square-integrable  function  is  desired.  A  host  of  such  problems  comes 
under  the  heading  of  an  harmonic  (spectral)  analysis  of  signals  contain¬ 
ing  non-square-integrable  noise  and/or  random  components,  whether  it  be 
speech  recognition,  image  processing,  geophysical  modeling,  or  turbulence 
in  fluid  mechanics.  Such  problems  can  be  attacked  by  Beurling's  profound 
theory  of  spectral  synthesis,  as  well  as  by  the  extensive  multifaceted  theory  of 
time  series,  e.g.,  [46].  Beurling's  spectral  synthesis  does  not  deal  with  energy 
and  power  considerations,  i.e.,  quadratic  criteria,  and  time  series  relies  on 
a  stochastic  point  of  view.  Our  goal  is  to  implement  Wiener-Plancherel  for¬ 
mulas  to  address  the  above-mentioned  group  of  problems.  These  formulas 
are  well-suited  to  deal  with  energy  and  power;  and  they  provide  an  analytic 
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device  which  should  dovetail  with  spectral  estimation  methods  (from  time 
series)  developed  since  Kolmogorov's  and  Wiener's  time. 

In  Formula  5.1,  we  shall  state  our  spherical  Wiener-Plancherel  formula, 
viz.,  (5.1),  without  going  into  any  detail  concerning  hypotheses  and  motiva¬ 
tion.  We  feel  that  the  technicalities  and  hypotheses  are  sufficiently  complex 
to  warrant  a  displayed  version  at  the  outset.  The  relation  between  (5.1 )  and 
the  Wiener-Khinchin  theorem,  as  mentioned  in  Section  2,  becomes  apparent 
in  Section  5.4. 


Formula  5.1.  The  spherical  Wiener-Plancherel  formula  is 


lim  ' 

R— too  IB (0,  R)| 


B(0.R| 

c(d.k)(27i)‘*'‘ 


l<t)(t)pdt 


IDASklYll^dy, 


(5.1) 


cf.  Theorem  5.6  fora  precise  statement  of  hypotheses  for  the  validity  of  (5.1 ). 
The  function  sr  is  the  Wiener  s-f unction. 


Sk=4)*Ek.  (5.2) 

where  A'^Er  =  6,  cud-i  is  the  surface  area  of  the  unit  sphere  Ij-  ?,  c(d,  k)~’ 
is  the  L  ’  -norm  of  a  special  function  related  to  the  Fourier  transform  of  the 
restriction  of  surface  measure  era  - 1  to  Id  ~  i ,  e.g..  Example  5.4, 


DaSr  =  Sr  -  MaSr, 

and  Ma  is  the  spherical  mean-value  operator  defined  by 
JvIaSrIy)  =  — ^  sr(y  +  Ae)dad-i(0)- 

^a-1  Jij  , 

The  integer  k  is  related  to  the  dimension  d,  and  there  must  be  control  of  the 
quadratic  means  of  <()  over  spheres  in  order  to  verify  (5.1 ).  The  operator  L 
described  above  is  the  iterated  Laplacian  A'^. 

Remark  5.2. 

1)  In  previous  work  with  my  students  G.  Benke  and  W.  Evans  [7],  we 
proved  a  rectilinear  version  of  (5.1).  The  rectilinear  result  is  easier 
to  prove  than  the  spherical  one,  although  by  no  means  elementary. 
Also,  in  the  case  of  "rectilinear  geometry"  the  operator  L  is  the  hyper¬ 
bolic  operator 


L  =  d)  02  . . .  0d'. 
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whereas,  the  "spherical  geometry"  of  (5.1)  gives  rise  to  the  elliptic 
operator  L  =  A*^.  This  remark  indicates  there  is  a  range  of  Wiener- 
Plancherel  formulas  according  to  the  number  of  degrees  of  freedom 
available  in  various  convergence  criteria. 

2)  It  is  natural  to  expect  significant  differences  between  the  rectilinear  and 
spherical  cases. 

The  analogous  situation  with  the  convergence  problem  for  mul¬ 
tiple  Fourier  series  makes  this  point  clear.  There  are  several  natural 
rectilineal  convergence  criteria  for  multiple  Fourier  series,  and  there 
exist  positive  results  in  some  cases.  For  example,  using  the  Carleson- 
Hunt  theorem  for  d  =  1,  C.  Fefferman  [21]  (1971)  proved  that 

lim  y  =())(t),a.e.  (5.3) 

R— foo 

meRPnt^* 

for  4)  6  1  <  p  ^  oo,  where  P  C  in'*  is  a  d-dimensional 

polygon.  The  rectilinear  convergence  we  used  in  [7]  is  analogous  to  the 
so-called  "restricted  rectangular"  convergence  criterion  in  the  theory  of 
multiple  Fourier  series;  this  criterion  is  different  from  that  of  (5.3).  If  the 
polygonal  convergence  of  (5.3)  is  replaced  by  spherical  convergence, 
then  it  is  not  known  whether  all  the  elements  of  ),  d  >  1, 

have  a  Fourier  series  representation  pointwise  a.e..  There  are  negative 
results  if  p  <  2.  The  problem  of  multiple  Fourier  series  with  spherical 
convergence  criteria  is  closely  related  to  deep  problems  associated  with 
Bochner-Riesz  multipliers.  There  are  some  positive  results,  and  we 
close  this  discussion  with  one  such  theorem  due  to  Carbery  and  F. 
Soria  (1988):  if  d  5  2,  a  >  0,  2  $  P  <  2d/(d  -  1 ),  and  4)  is  an  element 
of  the  Sobolev  spac'^ '  I’  '‘W** )  then 


lim 

R-4  0O 


4)(y)e^"“  ’’'dy  =  4>(t),a.e. 


Example  5.3.  A  formula  such  as  (5.1 )  established  a  mapping  between  spaces 
of  functions.  For  example,  if  the  left  side  of  (5.) )  is  finite  then  ||4j||iW(Oij  i  < 
oo,  where  B^(91‘*)  consists  of  functions  having  bounded  quadratic  means 
over  spheres.  There  is  a  hierarchy  of  Besicovich  spaces  B(p,q)  of  which 
B^  =  B(2,oo).  For  the  right  side  of  (5.1 )  the  corresponding  hierarchy  V(p,  q) 
is  related  to  Besov  spaces.  In  the  case  d  =  1,  the  mappings  B(2, 1 )  ->  V(2, 1 ) 
and  B(2,oo)  — >  V(2,oo),  established  by  Wiener's  original  Wiener-Plancherel 
formula,  are  topological  isomorphisms.  The  first  mapping  is  a  consequence 
of  an  important  result  by  Beurling  [13],  coupled  with  an  extraordinarily 
clever  insight  of  my  student  C.  Heil  [28].  The  second  mapping  is  due  to 
Chen  and  Lau  [14].  Taking  d  ^  1  and  using  the  rectilinear  Wiener-Plancherel 
formula  in  [7],  Heil  also  proved  that  the  mapping  B(2,q)  -♦  V(2,q)  is  a 
topological  isomorphism  for  1  $  q  ^  oo. 
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5.2.  The  spherical  Wiener-Plancherel  formula 


As  mentioned  in  Formula  5.1,  we  need  the  following  example  in  the  basic 
formula  (5.1 )  and  in  Theorem  5.6. 

Example  5.4.  For  each  dimension  d  ^  2,  we  utilize  the  function, 

2 


Kicfrl  =  r 


4k-d 


1  - 


2n  ( 1 


UJd-l 


liLU  I 


(t 


where  k  ^  0  is  an  integer,  r  >  0,  and  1  is  a  Bessel  function  of  the  first  kind. 


Defuiition/Remark  5.5. 

1)  The  space  of  functions  having  bounded  quadratic  means  over 

spheres  is  the  set  of  all  functions  (jj  €  Li„c^(91‘’)  tor  whii.h 

ll4>||B^(i«4)  =sup  (  —  ’  -  [  |(l)(t)l^dt)  <  oo.  (5.4) 

R>0  \|n(0.  k)l  Jn(0.Kl  / 

B-^(91‘* )  is  a  Banach  space  with  norm  defined  by  (5.4). 

2)  Given  4)  €  Lioc'^(^^‘‘).  The  spherical  average  of  (t>  is  the  function  O 
defined  as 


0(t)  =  — ^ —  l4)(re)|^daa-i(B),  r  >  0.  (5.5) 

'■'^d-1  Ji,.  , 

3)  A  basic  property  of  spherical  averages,  and  one  that  is  relevant  for  com¬ 
parison  with  the  classical  and  rectangular  Wiener-Plancherel  formulas 
17],  is  that 

<P  e  implies  4)  €  B^(51‘').  (5.6) 

The  verification  of  (5.6)  is  immediate; 


1 


|B(0,R)| 


r‘'-'(D(r)dr 

H(O.k)  |D(0»  K)|  Jo 


< 


d|B(0,R)i 


u>d-tR‘'  =  IIOl 


4)  Clearly  B^(1R‘*)\L'®(‘H‘‘)  ^  0.  In  fact,  we  can  choose  a  continuous 
radial  element  4j  e  L^(‘H‘‘)  for  which  limr-.ool4>(T)l  =  oo-  This  function 
also  shows  that  the  converse  of  (5.6)  fails  since  0(r)  =  I45(t)I'^.  Further, 
this  observation  shows  that,  for  the  class  of  radial  continuous  functions, 
<D  e  L°°  if  and  only  if  4)  G  L®®  (91** ). 
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The  following  result  is  our  spherical  Wiener-Plancherel  formula. 
Wiener's  Tauberian  theorem  (for  the  multiplicative  group  of  reals)  plays  an 
essential  role  in  its  proof. 

Theorem  5.6.  [6]Given4)  €  Lioc^lfR^'),  which  is  bounded  in  a  neighborhood 
of  the  origin,  and  an  integer  k  for  which  4(k  —  1)  <  d  <  4k  and  d  >  2k 
(d  >  4(k  —  1 )  implies  d  >  2k  for  k  $  2.)  Assume  the  spherical  average  <1> 
of  4)  is  an  element  of  L°°.  Then  and  the  spherical 

Wiener-Plancherel  formula  (5.1 )  is  valid  in  the  sense  that  if  the  left  side  exists 
then  the  right  side  exists  and  they  are  equal.  The  constant  c(d,  k)  in  (5.1 )  has 
the  explicit  representation, 

dr 

c(d,k)”^  =  Ku(r)  — . 

.  0  ^ 

Another  technical  ingredient  in  the  proof  of  Theorem  5.6  is  stated  sep¬ 
arately  below  as  Theorem  5.7  because  of  its  use  in  Example  5  4. 

5.3.  The  Laplacian  and  spherical  mean  value  operator 

Theorem  5.7.  Given  g  €  a  €  C,  and  f  6  §'('21*^).  Assume  f  satisfies 

the  following  conditions:  e  8'(tH‘*)  is  a  Borel  measurable  function, 

3R  >  0,  such  that  f^  c-  Lioc^{B(0,  R)  ). 

and 

It|^f'''(t)  €  Lio.^(fH‘'). 

Then  f  -  Alxf  €  (tH'* )  and 

lig  -  a(f  -  MAfJlli 

=  g^(t) -af'"(t)  (l  - -i^(|t|A)-Vi^(27i|t|Al)  I  .  ' 

\  cud-i  ^  /  I2 

Proof. 

1)  The  hypothesis,  |t|^f'^(t)  e  I,  implies  that  f''' €  Li,.c '^(iH‘'\!0I.) 

In  fact,  if  K  c  is  compact  and  0  ^  K  then 

if'^itii^dt  =  [  ^if^(t)i^dt  $  c  ( itnf^dii^dt  <  00. 

.  K  Jk  Pi  Jk 

If  we  did  not  assume  to  be  a  Borel  measurable  function  then  it 
could  contain  terms  of  the  form 
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2)  We  now  prove 
f^(t)0A(t) 


V  Wd-l  / 


Since  f^(t)  e  L^(B(0,R)')  and 


sup  |t|  '*'^Id^(27T|t|A)^  ^  C  sup  |tr‘‘*^(27i|t|A)-’ 

|tl3R  ItL^k 

j;  Ca  sup  |tr‘*' '  ^  CaR”'"  ’ 
UI=SR 

it  is  sufficient  for  (5.8)  to  dominate 


fV(t)(l - (ItlAj-'^I^IZTiltlA)  )  dt.  (5.9) 

1  V  a'<i-i  .1  / 


This  integral  is 


fV(t)  ,..^(|t|A) 
\  f-M,, 


2rt  _  .1  /  27x1  tiA 

- (ItIA) 

te<i  - 1  V  2 


\  2  J 

/27xltiA\ ^  (-j(27T[tiA)-^)>- 

^  ^  kir(d^"  + V'i'n 


.  lUO.K  ■ 


!  [a-a-i  k;ir(|  +  k) 


Thus,  by  Minkowski's  inequality,  we  have 


l'  ^  5S  ^  I  f'^(t) 


271  (TTltlA)'^'^  I 

a-., -I  k:r('f  +  k]  . 


'■  ^(2)  £  kini + 

and  this  is  finite  since  |t|-^f^(t)  t  As  a  consequence,  (5.8) 

is  valid. 

3)  Distributionally,  we  have 


VvJj  €  SlfH**),  (fVe^)(^|,)  ^ 
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The  left  hand  side  is 

f^(t)(0A^i;)(t)dt  = 


f^(t)(0A(t)3l)(t))dt  =  f(0A3i))'^ 


=  %) 

=  ^(Y)(ii^(Y) 


1  - 


1 


e-^’'‘'^»dad-,(0))3^(t)e 
‘•‘Jd-i  Jjia  ,  / 


—ln\K 


>'dt 


1 


lj)(t)c 


—  27iit  ly  I  \0) 


dt  dad-iO) 


<•^‘1-1  Jid  I 

=  f(4. -MAii))-  =f(U>)--(MIf)(ij))-  =  (f-MAf)(^). 


Thus,  f-JVfAf  =  (f'^0A)"'- 

Since  f^0A  €  we  know  that  (f^dAt"^  €  and, 

hence,  that  f  —  JviAf  t  L^(0t‘').  (5.7)  is  a  consequence  of  the 

Plancherel  theorem. 


The  operator  (on  the  function  f). 


.-.~(f-.¥Afl, 

corresponds  to  the  Laplacian  in  fH‘*  in  the  same  way  the  difference  operator, 
:i(T_Af-TAf).  (5.10) 


corresponds  to  the  ordinary  derivative  in  tK,  cf.  (7)  for  the  rectangular  gen¬ 
eralization  of  (5.10)  to  and  Item  1  of  Remark  5.2  for  the  corresponding 
differential  operator.  Wiener  made  the  following  calculation  for  the  case 
d  =  3  [53,  Volume  III,  pp.  718-727]  (1927). 

"Theorem"  5.8.  Given  f  e  S'((H‘M  for  which  At  L^(fH‘*)  and  f''  is  Borel 
measurable.  Then 


lim 

A-.C 


(f-.MAf) 


=  0. 


1 


(5.11) 


"Proof".  Since  Af  is  a  convolution  of  f  e  8'(9T')  and  a  distribution 
having  compact  support,  the  exchange  formula  is  valid  and  At^(t) 
-47T^|t|^f^(t)  e  S'ifH^').  The  hypothesis,  Af  e  L'^flR''),  allows  us  to 


5.4.  Multidimensional  spectral  estimation 

Definition  5.9.  Given  4^  ^  L[,u )  and  define 

VR>0,  ==  f  4)(t  +  x)^(x1dx. 

io(v,  K)|  Jmo.K. 

Suppose  that  there  is  a  continuous  positive  definite  function  for  which 
limR_:«  -  P4,  in  the  ofM!?!**),  CcftH'*))  topology,  where  M(SH'')  is 
the  space  of  measures  on  tH'*.  Then  P,(,  t  is  the  autocondalion  of 

(J),  and  the  positive  measure  u,|,  =  P4,  is  the  power  spectrum  of  4),  cf.  the 
Wiener-Khinchin  theorem  (Theorem  2.2). 
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Remark  5.10. 

1)  Depending  on  the  particular  problem,  the  weak  topology  in  Defini¬ 
tion  5.9  can  be  replaced  by  various  other  convergence  criteria,  including 
pointwise  convergence. 

2)  Given  4)  t  Lu,c  ^  (Of'*)  with  power  spectrum  and  assume  there  is  an 
increasing  function  i(R)  on  (0,oo)  for  which  sup|,,^^  |ij)(t)|  ^  i(R)  and 
limR-,^-  ilRl-^/R  =  0.  Then  we  can  prove  that 

C.dOT*),  (5.14) 

N’ ♦  4)(t)|^dt  =  [|4)(Y)pdM4.(Y), 

K-ick  |ti(0,R)j  J|i|0  K)  J 


[2,  Section  5]. 

If  we  take  if  ^  i  in  (5.14)  then  the  left  side  of  (5.14)  is  the  arithmetic 
mean  on  the  left  side  of  our  Wiener-Plancherel  formula  (5.1).  Given  if  c 
I  loi  )  tind  combining  the  formulas  (5.1 )  and  (5.14),  it  is  then  reasonable 
to  expect  that 

c(d,k)(2rt)-*^  , 

hm - r7r::7rfD,iSkl  -  n<t,  (a.  15) 

in  some  weak  topology.  In  this  same  spirit  we  provide  the  following  calcu¬ 
lation  which  Wiener  made  for  the  case  d  =  1  (53,  Volume  II,  pp.  219-223] 
(1930). 


Formula  calculation  5.11.  Given  if  c  I-i,u‘(iH‘')  with  autiKorrelation  P^.. 
For  t  €  tR‘‘, 


,,  cld,k)(27T)-’‘- 

Iim - — . — p 

A  -.c  a\i  -  I A'^*- 


Dusciyll^c-'"'  ''dy  =  P4,(t). 


'Proof". 

1)  A  direct  calculation  gives 

Li-",  iwo'.Tii  Ut;*  • 

-  |(f  (t  +  x)  -  (f (x)l'^  +  i|if (I  -t-  x)  +  i(f (x)l^ 
-i|if  (t  +  x)  -  i0(t)i^ }  dt 

'(K,  -  )-iK,  -iK4). 
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2)  Let  vi)(x)  =  4)(t  +  x)  +  c4)(x),  where  |c|  =  1;  and  write  SkOKy)  = 
(0E^)^(y),  so  that  Sk(<t))  =  Sk-  By  the  Wiener-Plancherel  formula  we 
compute  the  following 


lim  '  IvKxll^dx 

K->oo  |d(0,  K)|  Jb(0,R) 

c{d.k)(27T)^^  f,_  , 

=  I’iPo  J  +  UxSk(cct))(Y)!  dy 

=  I™  f  !DASk(T-,4))(y)  -  e^"'‘  ^'D,kSk(4))(y 

r  fc  +  0^"“  '')D,kSk((J))(y)i^dy 


=  I  |DxSk(4))(y)r  ic  +  dy. 


where  the  "error"  I  is  estimated  bv 

c(d,k)(2TT)-*‘- 
am  - TiT— r 

>.  -0  a\i  lA'*'"  '* 


DvSkiT..  i4>)(y)  -  c-'"'  ''D\Sk(4>ilYir  dy. 


Under  natural  hypotheses,  and  implementing  Theorem  5.7  we 
can  show  that  the  limit  in  (.5.19)  vanishe.s,  i.e,,  F,  -  0. 

3)  We  now  combine  the  right  side  of  (5.17)  and  (5.18)  with  F  =  0,  for  the 
four  cases  c  -  t  1 ,  i  i.  Thus, 

„  1  ,  c(d,k)(2n)-”'  r  ...  .  . 


1  c(d,k)(2n)“-  ,  . 

,(t)  -  an - -  -  ilJk.Skly)!" 

't  \  .0  ie,(  I  A  '*"  *' 


•  :(2  4  c  ,(y)  +  c,(y))  -  (2  -  c  ,(y)  -  C|(y)) 

)i(2  4  ic  -,(y)  ic,(y))  -  i(2  ic  i(y)  <  ic,  (yll  dy, 

where  c„(y)  c"’”'"  Combining  terms,  vveobtam  15,1b). 

Formally,  (5.15)  and  (5.16)  are  compatible.  If  we  are  given  data  (|)k  on  a 
set  S,  these  formulas  lead  us  to  consider  multidimensional  ^fHxInil  rsfinnilorf 
molded  trom  expressions  of  the  form 

c(d,k)(27xl-'*'  I  ,  ,  ,  1^ 


cu,i  -lA 


,,k.<,  nk((().s‘fk)  . 


Stationary  frames  and  spectral  estimation  } 


{  157 


Instead  of  continuing  this  section  with  a  quodlibetic  discussion  of  spec¬ 
tral  estimation,  we  shall  refer  to  the  classical  spectral  estimation  algorithms 
and  results  on  evolutionary  spectra  for  nonstationary  processes,  e.g.,  [46, 
Chapter  11]. 

6.  Notation 

Let  G  be  a  locally  compact  abelian  group  with  dual  group  P,  e.g.,  G  =  91‘‘ 
and  r  =  91'^,  where  is  d-dimensional  Euclidean  space,  or  G  = 

T  =  91/Z  and  P  =  Z,  the  group  of  integers.  Mb(P),  resp.,  M+(P),  is  the 
space  of  bounded,  resp.,  positive.  Radon  measures  on  P;  and  Mbt(P)  = 
Mb(P)  n  (P).  is  the  weighted  L*’ -space  defined  by  its  norm, 

IKIlp.w  =  (J where  1  $  p  <  oo,  p  €  Mf(r),  and  integration 
is  over  P. 

The  Fourier  transform  of  f  6  L’  (fR**)  is  fly)  =  J  f(t)e~'^’’'‘  ‘‘'dt,  where 
integration  is  over  and  designates  the  inverse  Fourier  transform  of 
P  €  Mb(P). 

If  S  C  G  then  |S|  is  its  Haar  measure  and  Is  is  the  characteristic  function 
of  S.  6,n,n  is  I  if  m  =  n  and  0  if  m  /  n.  Finally,  if  X  is  contained  in  a 
topological  vector  space  H  then  spX  is  the  linear  span  of  X  in  H,  and  spX  is 
its  closure  in  H. 
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S 

E  The  theory  of  iterated  fuzzy  set  systems,  IFZS,  was  introduced  by  Cabrelli 
et  al.  in  (41.  They  showed  that  by  combining  the  idea  of  representing  an 
image  as  a  fuzzy  set  with  the  tneory  of  iterated  function  systems,  it  is  possible 
to  generate  images  with  grey  or  colour  levels  as  attractors  of  IFZS.  The 
purpose  of  this  paper  is  to  show  that  the  class  of  attractors  of  IFZS  is  dense 
in  the  class  of  images,  i.e.,  each  image  can  be  approximated  with  the  desired 
accuracy.  A  brief  review  of  the  main  concepts  of  IFZS  is  presented  first. 


I.  Introduction 

We  first  want  to  present  an  overview  of  the  theory  of  iterated  fuzzy  set 
systems  (IFZS).  Since  a  complete  development  of  the  theory  can  be  found 
in  [4],  we  are  going  to  omit  most  of  the  proofs.  We  then  show  that  the  set 
of  images  that  can  be  obtained  using  this  approach,  is  dense  in  the  set  of 
all  images. 

The  notion  of  self-similarity  and  its  generalizations’,  has  found  a  nat¬ 
ural  frame  in  the  theory  of  iterated  function  systems  (IFS):  self-similar  sets 
became  attractors  of  certain  systems  of  maps  (10, 1,  8].  The  generalization 
of  the  concept  of  self-similarity  to  a  more  general  class  of  maps — other  than 
similarities,  introduced  more  flexibility  in  the  model,  widening  the  class  of 
sets  that  have  the  property  to  be  expressed  as  smaller  copies  of  themselves. 

'  A  subset  S  of  an  arbitrary  set  X,  is  said  to  be  self-similar  (in  the  wide  sense)  if 

there  exist  a  finite  number  of  maps  f  I . Fn  ,  fi  ;  X  -♦  X  such  that  S  =  (J,  fi(S1 
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On  the  other  hand,  the  use  of  IFS  enabled  the  construction  of  self-similar  sets 
of  fractional  dimensions,  and  therefore  this  theory  has  found  wide  applica¬ 
tions  in  computer  graphics  to  generate  fractal  images  on  computers  (see  for 
example  [3, 13]).  The  ergodicity  involved  in  the  process  is  another  advantage 
that  this  method  provides  in  image  generation  and  representation,  see  [7]. 

One  of  the  major  applications  of  IFS  theory  in  image  processing,  is  in 
data  compression,  huge  amounts  of  data  can  be  squeezed  into  a  few  number 
of  parameters.  Two  questions  naturally  arise; 


«  Which  kind  of  images  can  be  represented  through  this  model,  or,  how 
big  is  the  class  of  images  that  can  be  represented  through  IFS? 

■  Is  there  an  efficient  algorithm  or  method  to  find  that  representation? 


Regarding  the  first  question,  in  the  case  that  the  maps  are  contractive 
but  not  necessarily  similarities,  it  has  been  shown  [9]  that  this  class  is  dense 
in  the  class  of  compact  sets.  In  image  processing  language  this  means,  that 
to  any  object  in  a  black  and  white  image,  one  can  associate  an  IFS  code.  This 
result  shows  that  the  so-called  inverse  problem  for  fractals  and  other  sets, 
that  is  to  find  the  IFS  code  associated  with  any  given  black  and  white  image, 
has  at  least  one  solution.  It  is  well  known  however,  that  in  most  of  the  cases 
the  solution  that  can  be  constructed  from  the  proof  of  the  theorem  does  not 
yield  good  compression  rate.  It  is  a  very  difficult  problem  to  find  an  efficient 
IFS  code  for  a  given  black  and  white  image.  Some  results  in  that  direction 
for  the  one  dimensional  case  can  be  found  in  [2, 5, 16). 

In  the  case  of  images  with  grey-levels,  the  IFS  theory  provides  us  with 
a  class  of  measures  that  are  generated  by  adding  a  probability  vector  to 
each  IFS  code.  The  ergodicity  allows  one  to  generate  this  measure  through 
a  random  iterative  algorithm.  This  approach  however,  seems  to  have  two 
weak  points:  first,  the  relation  between  the  parameters  and  the  resulting 
measure  is  not  straightforward,  and  this  then  becomes  a  serious  difficulty 
for  the  inverse  problem.  Secondly,  the  class  of  measures  that  can  be  obtained 
through  IFS,  seems  not  to  be  as  wide  as  desirable.  The  question  of  how 
big  this  class  of  measures  is  in  relation  to  a  suitable  space  of  measures  (here 
suitable  refers  to  images)  seems  to  be  still  open. 

The  IFZS  approach  to  grey-level  images  considers  images  as  functions 
rather  than  measures,  and  hereby  tends  to  avoid  these  problems.  In  that 
direction.  Theorem  3.1  of  this  paper  shows  that  the  class  of  images  that  can 
be  generated  using  IFZS  is  dense  in  the  class  of  images,  i.e.,  given  a  grey-lev  '1 
image,  we  prove  that  for  a  given  c  there  exists  an  IFZS  whose  attractor  is 
closer  than  c  to  that  image. 
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2.  The  iterated  fuzzy  set  systems  (IFZS) 


2.1.  Iterated  function  systems  (IFS) 

s  Let  us  briefly  recall  the  basic  notions  of  IFS.  Given  a  compact  metric  space 
(X,  d)  with  distance  d,  let  us  consider  N  contraction  mappings  Wi  :  X  — >  X. 
The  metric  space  X,  together  with  the  N  contraction  mappings  is  referred 
to  as  an  Iterated  Function  System  (IFS)  and  denoted  by  {X,w}.  Usually,  in 
applications,  X  is  a  compact  subset  of  iR". 

If  3f(DC)  denotes  the  set  of  all  nonempty  closed  subsets  of  X,  we  can 
define  N  set-valued  maps  ivi  :  J{(X)  9f(X),  by  vvi(S)  =  {wi(x)  :  x  t  S), 

e.g.  the  image  of  S  under  the  transformation  Wi,  for  all  S  €  W (X).  If  h  is  the 
Hausdorff  distance  in  3{(X): 

h(A,B)  :=max{D(A,B),D(B,A)}  (2.1) 

where 

D(A, B)  =  sup  inf  d(x,y)  (2.2) 

xeA 

then  (5{(X),  h)  is  a  compact  metric  space,  and  Wi  are  contraction  mappings 
of  IK(X).  The  map  W  :  (KIX)  ->  M(X)  defined  by: 

N 

W(S)=  (J  vvi(S),  VS€3<(X)  (2.3) 

i  1 

is  also  a  contraction  on  ;K(X).  Therefore  it  possesses  an  unique  fixed  point 
(or  invariant  set)  A,  called  the  attractor  of  the  IFS; 

N 

>l=W(>l)=  IJwi(A).  (2.4) 

i  1 

This  shows  that  A  is  self-similar  with  respect  to  W) , . . . ,  wn  ■  This  property 
is  sometimes  referred  to  as  the  self-tiling  property  of  IFS  attractors,  meaning 
that  A  can  be  built  with  smaller  copies  of  itself.  As  well,  the  name  attractor 
is  justified  by  the  following  property: 

h(W''(S),A)^  0  asn->oo,VS€:K(X).  (2.5) 

2.2.  Fuzzy  sets  as  generalization  of  sets 

The  notion  of  fuzzy  sets  introduced  by  Zadeh  in  1965  [17],  has  been  widely 
used  in  different  contexts.  We  want  to  use  it  here  in  the  sense  of  a  general¬ 
ization  of  the  concept  of  set:  If  X  is  an  arbitrary  (non  empty)  set,  a  fuzzy  set 
(in  X)  is  a  function  u  with  domain  X  and  values  in  [0, 1],  i.e.,  u  :  X  fO,  1]. 
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In  particular,  if  S  is  an  ordinary  subset  of  X,  its  characteristic  function  xs 
is  a  fuzzy  set.  To  relate  this  concept  with  images,  we  think  of  a  digitized 
picture  as  a  set  of  pixels,  each  of  which  has  associated  a  grey-level,  the  value 
1  representing  black  or  the  foreground,  the  value  0  representing  white  or  the 
background.  The  value  u(x)  then  corresponds  to  the  grey-level  of  the  pixel 
X.  If  the  image  is  black  and  white,  we  only  have  two  values:  0  or  1,  and 
therefore  we  can  represent  it  by  a  characteristic  function,  or  a  "set." 

If  J(X)  denotes  the  class  of  all  fuzzy  sets  in  a  metric  space  (X,  d),  i.e., 
all  functions  u  :  X  — t  [0, 1),  we  are  going  to  restrict  ourselves  to  a  subclass 
T'iX)  c  T(X):  namely,  u  e  ^’(X)  if  and  only  if: 

1)  ueT(X), 

2)  u  is  uppersemicontinuous  (u.s.c)  on  (X,  d), 

3)  u  is  normal,  that  is  u(xo )  =  1  for  some  xo  t  X. 

These  properties  yield  the  following  results: 

a:  For  each  0  <  a  $  1,  the  a— level  set,  defined  as  iu]'*  :=  |x  X  : 

u(x)  ^  a]  is  a  nonempty  compact  subset  of  X, 
b:  The  closure  of  (x  €  X  :  u(x)  >0!,  denoted  by  ful^\  is  also  a  nonempty 
compact  subset  of  X. 

Note  that  the  characteristic  function  of  a  closed  set  is  in  T’lX).  We  also  want 
to  point  out  here  that  the  level  sets  of  the  fuzzy  set  u  completely  characterize 
u,  i.e.,  knowing  u(x|,  Vx  t  X,  is  equivalent  to  knowing  [ul^.O  s:  ix  I. 

By  the  above  properties,  lul"  5t'(.T),0  ^  a  5;  1.  We  now  introduce 
the  metric  d-^^  on  T'(X)  (see  [6)),  which  has  been  used  in  many  applications 
of  fuzzy  set  theory  [11, 12, 151: 

d^(u.v)=  sup  ;h([ur.ivl");  Vu,v-:T*(X).  (2.6) 

\  a  V  1 

Here  h  is  the  Hausdorff  metric  introduced  in  (2.1).  The  metric  space 
(7*(X),d^)  is  complete.  This  space  represents  the  generalization  of  the 
space  (Tf(X),h)  to  fuzzy  sets. 

At  this  point  we  want  to  incorporate  the  IFS  theory  into  the  fuzzy  set 
frame.  Therefore,  we  first  use  the  extension  principle  for  fuzzy  sets  [18,  14] 
in  order  to  extend  the  set-valued  maps  wj  defined  in  Section  2.1  to  maps 
between  fuzzy  sets,  i.e.,  we  want  todefinea  mapfrom  7*(X)  to  7*(X)  which 
is  equal  to  Wi  (with  the  earlier  mentioned  identification)  when  its  domain  is 
restricted  to  the  characteristic  function  of  a  set.  Therefore  we  define  for  each 
u  c  7*(X)  and  each  subset  B  of  X, 

u(Bl  ;=  sup'uiy  I :  y  t  B;,  if  B  /  0 


am  :=0, 


(2.7) 
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which  implies,  in  particular,  u{{x})  =  u(x)  at  each  x  G  X. 

For  each  Wj.i  =  1,2, . . .  ,N,  and  each  x  €  X  we  now  define 

Ui(x);=u(wri({x})),  (2.8) 

where,  of  course,  wr’  ({x))  =  0  if  x  ^  w(X).  If  u  €  J'lX),  then  each  of  these 
functions  Ui ;  X  [0, 1]  is  a  fuzzy  set  in  J*  (X)  (see  [4]). 

In  fuzzy  set  theory,  the  union  of  two  fuzzy  sets  u,v  is  usually  defined  as 
the  fuzzy  set  sup(u,v).  We  could  then  generalize  the  contraction  mapping 
W  given  by  equation  (2.3)  to  a  map  w  :  ^‘(X)  -7  ^‘(X)  defined  by: 

w(u)(x)  =  sup  Ui(x),  for  each  u  €  J'lX).  (2.9) 

l$i$N 

In  [4]  it  is  shown  that  this  is  a  contraction  mapping  on  J'iX)  with  the  doc- 
metric.  Therefore  it  has  a  unique  fuzzy  attractor  u*  €  J*(X),  e.g.,w(u* )  =  u'. 
It  turns  out  however,  that  this  fuzzy  attractor  is  the  characteristic  function 
of  the  attractor  of  the  IFS  {X,w|.  This  means  that  the  direct  generalization 
of  the  IFS  theory  to  Fuzzy  Sets,  does  not  provide  us  with  a  bigger  class  of 
attractors.  We  will  see  in  the  next  section,  how  this  class  can  be  enlarged 
without  losing  the  contractivity  of  the  map  iv. 

2.3.  Modification  of  the  grey-levels  of  the  attractor 

In  order  to  gain  more  generality  with  the  fuzzy  set  model,  the  "grey-level 
maps"  are  introduced.  To  each  Ui(x)  defined  in  (2.8),  a  grey-level  map 
(Pi  :  fO,  1]  — >  fO,  1]  is  associated,  in  order  to  modify  the  values  of  u;,  that  is 
the  grey-levels. 

Now  the  supremum  of  (2.9)  is  taken  over  the  functions  Ui  modified  by 
the  functions  tp,;  e.g., 

u  sup  (piOUi.  (2.10) 

IS'itN 

In  other  words,  an  operator  Ts :  T'lX)  >  7*(X)  is  introduced: 

(T.,u)(x)  ;=  sup(<pi  (ui  (x)).. . .  ,(pn(un(x))1 

=  sup[(pi(u(vvc'(x))) . (Pn(u(m'“'(x))11.  (2.11 ) 

In  order  for  the  operator  to  be  well  defined,  the  grey-level  functions  (Pi 
have  to  satisfy  certain  conditions,  namely:  for  i  =  1 , 2, . . . ,  N, 

1)  (Pi  :  (0,  li — )  (0, 1]  is  non-decreasing, 

2)  (pi  is  right  continuous  on  (0, 1 ), 

3)  (pi(0)  =  0,  and 

4)  for  at  least  one  j  €  (1, 2 . N),  (pjd )  =  I. 
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The  fact  that  cpi  are  non-decreasing  and  right  continuous,  guarantees  the 
uppersemicontinuity  of  cp;  ou  for  any  u  in  J*(X),  moreover  they  are  necessary 
and  sufficient  conditions  [4].  Property  3)  is  a  natural  assumption  in  the 
consideration  of  grey  level  functions:  if  the  grey  level  of  a  point  (pixel)  x  e  X 
is  zero  (the  pixel  is  in  the  background),  then  it  should  remain  zero  after  being 
acted  upon  by  the  <pi  maps. 

The  set  of  maps  <D  =  {cpi.i  =  1,2,  satisfying  the  above  condi¬ 

tions,  together  with  the  N  contraction  maps  Wi  (which  then  yield  ui)  form 
the  Iterated  Fuzzy  Set  System  (IFZS)  denoted  {X,  w.  O'. 

In  [4]  it  is  shown  that  the  operator  Ts  as  defined  in  (2.11)  is  indeed  a 
contraction  mapping  on  (?*(X),doo),  i.e.,  T^maps  J'lX)  into  itself  and  there 
exists  an  s,  0  $  s  <  1,  such  that 

doo(TsU,TsV)  $  s  doc(u,v)  Vu,ve3'’(X).  (2.12) 

Therefore,  by  the  Contraction  Mapping  Principle,  Ts  possesses  an  unique 
fixed  point  u”,  that  is: 

Tsu‘=u'.  (2.13) 

This  implies  that  there  exists  a  unique  solution  to  the  functional  equation  in 
the  unknown  u  €  T*(X), 

u(x)  =  sup;cpi(u(Hq '(xl)j,(p2(u(vv7'(K.l)) . 

(2.14) 

(Pn(u(w^j  (x)))J, 

for  all  X  e  X.  The  fuzzy  set  solution,  u',  will  be  called  the  attar  or 
fuzzy  attractor  of  the  IFZS,  since  it  follows  from  the  Contraction  Mapping 
Principle  that 

doc((Ts)"v,u‘ )  “4  0  as  n  — >  oo,  Vv  €  T'(X).  (2.15) 

It  is  easy  to  find  examples  showing  that  these  fuzzy  attractors  are  not 
longer  only  characteristic  functions  of  closed  sets.  Hence,  using  IFZS,  the 
class  of  images  that  can  be  obtained  using  IFS  has  been  widened.  In  section 
Section  3,  we  show  in  fact  that  any  image  can  be  obtained  (up  to  an  c)  as  a 
fuzzy  attractor  of  an  IFZS.  Note  that  in  the  case  that  all  ip;  are  the  identity 
maps,  the  operator  reduces  to  the  one  defined  in  equation  (2.9). 

2.4.  Properties  of  the  fuzzy  attractors 

It  is  worth  mentioning  several  properties  of  the  fuzzy  attractors.  The  proofs 
can  be  found  in  [4]. 

Property  2.1.  If  A  c  ?f(X)  is  the  attractor  of  the  IFS  (X,w},  and  u*  6  T*(X) 
denotes  the  fuzzy  attractor  of  the  IFZS  {X,w,(I>],  then  supportin')  c  A,  that 
is,  [u*]^  C  A. 
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This  means,  that  using  the  grey-level  maps,  we  are  able  to  modify 
the  support  of  our  attractor,  allowing  for  example  a  rough  approximation 
through  the  Wi,  and  then  a  "fine-tuning"  using  the  (pi.  This  property  may 
be  used  for  applications,  if  we  want  to  find  the  IFZS  code  for  an  image.  Note 
that  support(u* )  is  exactly  equal  to  A,  in  the  following  two  cases: 

■  For  all  i  e  {1,2, . . .  ,N}.(pi(l)  =  1,  then  u*  =  xa- 
«  For  all  i  €  {1,2, ...  ,N},(pi  are  increasing  at  0  (i.e.,  (p{"’(0)  ={0)),  In¬ 
deed,  in  this  case  [u*]^'  =  Wi([<pi  ou']*^)  =  (j[^  ,  vvi(yi)  -  W{A]  = 
A. 

We  should  also  point  out  that  in  the  case  that  ipj(O)  >  0  for  one  i  € 
{1,2, . . . ,  Nj,  this  inclusion  is  not  longer  true. 

Property  2.2.  The  level  sets  of  the  fuzzy  attractor  satisfy  a  generalized  self¬ 
tiling  condition: 

Nl 

fu*l“  =  [J  vvi([(piOu*l“),  0^  ac-  1.  (2.16) 

I  1 

This  condition  is  a  consequence  of  the  property  of  the  operator  Ts: 

N 

fTul^  =  IJ  Wi(f(pi  ou)"*),  Vr^J'(X)  (see [4]).  (2.17) 

i  1 

This  property  is  interesting,  since  it  shows  that  the  fuzzy  attractor  is  no  longer 
self-similar,  in  the  sense,  that  it  is  no  longer  the  union  of  smaller  copies  of 
itself,  but  rather  a  union  of  modified  copies  of  itself.  The  modification  is  given 
by  the  grey-level  maps. 

Property  2.3  (IFZS  Collage  Theorem).  Let  u  e  T*(X)  and  suppose  that 
there  exists  an  IFZS  {X,w,<DI  so  that 

d^lu.TsU)  <  e,  (2.181 

where  the  operator  Ts  is  defined  by  (2.1 1 ).  Then 

dcc.(u,u‘)  <  (2.19) 

1  -  s 

where  u*  =  TsU*  is  the  invariant  fuzzy  set  of  the  IFZS,  and  s  is  the  maximum 
contraction  factor  of  the  w;. 

This  means  that  if  the  Wj  are  very  contractive  (i.e.,  .s  is  very  small), 
every  fuzzy  set  that  remains  relatively  unchanged  after  the  application  of 
the  operator  Ts,  is  close  to  the  fuzzy  attractor. 

This  property,  a  direct  consequence  of  the  contractivity  of  Ts,  is  (as  for 
IFS)  very  useful  for  the  inverse  problem. 
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3.  Density  of  fuzzy  attractors 

In  this  section  we  will  show  that  the  class  of  fuzzy  attractors  is  dense  in 
^■‘(X)  with  the  doc-  metric.  In  other  words,  given  a  fuzzy  set  u  in  J*(X), 
and  £  >  0,  we  can  always  find  a  natural  number  N,  N  contraction  mappings 
w’i  ;  X  — >  X,  and  N1  grey-level  maps  <pi  :  fO,  1]  >  10, 1],  such  that  the  fuzzy 

attractor  u*  of  the  associated  IFZS  {X,w,  OJ  satisfies:  doc(u,u*)  <  c.  We 
therefore  have  the  following; 

Theorem  3.1.  If  X  c  51"  is  compact  and  (J‘(Xl,d>: )  is  defined  as  above, 
then  the  class 

D  =  |u' t.  T*(X)  :  u*  is  attractor  of  some  IFZS  on  X., 
is  dense  in  (fr*(X),  I. 


Proof. 

Let  £  >  0  and  u  t  T*(X).  The  idea  of  the  proof  is  to  find  N  'Jl, 
w  =  iw] . u'N  '  and  =  |<pi , . . .  ,<*)n ;  such  that: 


1)  supci  <  \  (ci  is  the  contractivity  factor  of  v.'i); 

2)  doc(TsU,u)  <  where  is  the  operator  associated  to  iX.w.O,. 

Then,  using  the  IFZS  collage  theorem  (Property  2.3)  from  1 )  and  2)  we  have: 


dx  (u,u‘ 


where  u*  is  the  attractoi  of  the  IFZS 'X.w,<^],  i.e.,  T^u'  -u*.  I 

Let  us  now  find  w  and  tP,  such  that  1)  and  2)  are  satisfied:  Let  \  -r  51, 
and  xi , . . . ,  xn  be  an  \  -net  of  [ul'’\  i.e.,  (u;^'  ■:  ,  B(,  where  B,  -  BIxi,  j ), 

are  the  open  balls  of  radius  \  centered  at  x,. 

Let  w,  :  X  X,  u'dX)  c  Bi,  i  -=  1 . N  be  contraction  mappings 

with  contraction  factor  ci,  with  Ci  ■-  Choose  now  «o  0  and  «i  -- 
sup^gjr  u(x). 

Then  for  0  s-,  «  s  1  we  have  (ul“  e  (J.j  B;. 

We  now  choose  ipi  non-decreasing,  right  continuous,  such  that  4)1  (x)  i; 

Vx  [0, 11,  and  4)i(  1 1  -  ot,,  t  ^  I . N.  For  example,  ihe  stcpfunctions 

satisfy  these  conditions. 

Then 


[cp,  -  ur 


0  a  L  ix  w 
0  a  a;. 


But  using  condition  (2.17!,  we  have 


N 

(IsUl”  -  (J  w.Kcpi  ;.u!“)  -  (J  Widtpiouri. 

*  ^  !  c  a  \  cx  I } 
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Now,  if  the  i-diKitation  0*15)  of  a  nonempty  closed  set  S  is  0,s(S'i  ~ 
'x  X  :  d(x,  SI  <  i!,  6  >  0,  we  can  observe  that  for  0  <  a  y  1 

■UO  C  y  B,  ...  D^(iu;^)  iS.l! 

I  i  (.X  S;:;  i  i 

and 

'  B,  D,  (wjlSI) 

for  I  i  N,  vS  closed  subset  ol  X.  la. 2 


We  then  have: 


Nu 

-  U 

IJ>J  U  ■’ 

u  ' 

■y  B, 

X>  c' 

IJ  .  '  IsU 

Using  the 

above  ec]uations,  we 

then  c'btain 

h:  1, 

,u  '* ,  u  ■'  i  \  ,  0  • 

.  a  ■  1 , 

and  hence 

d  ,  i 

UU,  U  'y 

I 


4.  Conclusions 

The  1 17’S  model  represents  a  ditteient  and  promisini;  approach  to  the  in\  erse 
problem  for  fractal  construction  and  linage  encoding  1  he  mtrc>diiction  ot 
the  >;re\  -lev  el  maf's  allows  one  ti'  enlarj;i  the  class  ot  attractors  V\e  prov  e 
that  this  class  is  dense  m  'J  '  X  i,  the  space  ot  iippersemicontinuous  normal 
functions,  a  space  which  is  lar>;e  enough  tor  image'  represc'tvtation  .\gain 
the  [vroot  ot  the  densiiv  does  not  give  an  etficient  algon.hm  to  find  the 
appropriate  code,  but  it  provides  a  lhec<relical  |ustification  lor  the  lu//v 
set  approacli 

We  believe,  that  vve  might  be  al.Ie  to  relax  several  >.-ondttions  (>l  the 
model  presented  here,  in  order  to  efficiently  solve  the  inverse’  problem  \\t 
have  experimental  results  Kimtorting  our  intuition 


{  Cabrelli,  Matter 


172 


5.  Acknowledgements 

We  wish  to  acknowledge  support  from  the  CONICET  (Argentina)  and  the 
Department  of  Applied  Mathematics  of  the  University  of  Waterloo.  We  want 
to  express  our  gratitude  to  ah  the  people  of  the  Department  who  made  our 
stay  there  so  enjoyable.  In  particular  we  are  enormously  grateful  to  Profs. 
B.  Forte  and  E.  Vrscay  for  their  continuous  support  and  encouragement. 

We  were  able  to  attend  the  NAT0 1991  meeting,  thanks  to  financial  sup¬ 
port  granted  from  Profs.  B.  Forte,  K.  Hare  and  E.  Vrscay  all  from  University 
of  Waterloo,  and  Prof.  J.  Byrnes/NATO  1991. 

6.  Bibliography 

[1]  Michael  F.  Barnsley  and  Steven  Demko.  Iterated  function  systems  and 
the  global  construction  of  fractals.  Proc.  Rov.  Soc.  Land.,  .A,399:243-275, 
1985. 

[2]  Michael  F.  Barnsley,  V.  Ervin,  D.  Hardin,  and  J.  Lancaster,  Solution  of  an 
inverse  problem  for  fractals  and  other  seLs.  Proceedings  of  the  National 
Academy  of  Sciences,  83:1975-1977, 1985. 

[3]  Michael  F.  Barnslev  and  A.D  Sloan.  A  better  way  to  compress  images. 
BYTE  Magazine,  January  issue;2 15-223, 1988. 

[4]  Carlos  A.  Cabrelli,  Bruno  Forte,  Ursula  M.  Molter,  and  Edward  R. 
Vrscay.  Iterated  fuzzy  set  systems:  A  new  approach  to  the  inverse 
probit  n  for  fractals  and  other  sets.  Journal  of  Mathematical  Analysis 
and  Applications,  1992.  To  appear. 

[5]  Persi  Diaconis  and  Mehrdad  Shahshahani.  Products  of  random  matri¬ 
ces  and  computer  image  generation.  In  Random  Matrices  and  Their 
Applications,  volume  50  of  Contemporary  Mathematics,  pages  173- 
182.  American  Mathematical  Society,  W8(>.  Proceetlings  of  a  Summer 
Research  Conference,  held  June  17-23. 

[6|  P.  Diamond  and  P.  Kloeden.  Metric  spaces  of  fuzzv  sets.  Fuzzy  sets  and 
systems,  35:241-249,  199{). 

[7]  John  Elton.  An  ergodic  theorem  for  iterated  maps.  Ergodic  Theory  and 
Dynamical  Systems,  7:481-488,  r»87, 

[8]  Kenneth]  Falconer.  The  geometry  of  fr  .a/.st'f.s.  Cambridge  University 
Press,  1985, 

[9]  Kenneth  J.  Falconer.  Fractal  Ceometry,  Mathematical  Foundations  and 
Applications  John  Wiley  &  Sons,  Wo 


Density  offuzzi/  attractors  } 


{  173 

[10]  ].  Hutchinson.  Fractals  and  self-similarity.  Indiana  University  Journal 
of  Mathematics,  30:713-747, 1981. 

[11]  O.  Kaleva.  Fuzzy  differential  equations.  Fuzzy  Sets  and  Systems, 
24:301-317,  1987. 

[12]  RE  Kloeden.  Fuzzy  dynamical  systems.  Fuzzy  Sets  and  Systems,  7:275- 
296,  1982. 

[13]  Benoit  B.  Mandelbrot.  The  Fractal  Geometry  of  Nature.  W.H.  Freeman 
and  Company,  New  York,  1977. 

[14]  H.T  Nguyen.  A  note  on  the  extension  principle  for  fuzzy  sets.  Journal 
of  Mathematical  Analysis  and  Applications,  64:369-380,  1978. 

[15]  M.L  Puri  and  D.A  Ralescu.  The  concept  of  normality  for  fuzzy  random 
variables.  Ann.  Prob.,  13:1373-1379,  1985. 

[16]  Edward  R.  Vrscay  and  C.j  Roehrig.  Iterated  function  systems  and  the 
inverse  problem  of  fractal  construction  using  moments.  In  E.  Kaltofen 
and  S.M  Watt,  editors.  Computers  and  Mathematics,  pages  250-259. 
Springer  Verlag,  1989. 

[17]  Lotfi  A.  Zadeh.  Fuzzy  sets,  fn/orm.  Control,  8:.338-,3,53,  1905. 

[18]  Lotfi  A.  Zadeh.  The  concept  of  linguistic  variableand  its  application  to 
approximate  reasoning.  Information  Sciences,  8: 199-249, 30 1  -357, 1 975. 


Multifractal  measures 


Jacques  Peyriere 

Universite  Paris-Sud 

Departement  de  mathematiques,  bat.  425 
Unite  associee  au  CNRS  757 
91405  Orsay  Cedex  France 

peyr iereSmat ups . mat  ups . f r 

> 

a  The  present  redaction  is  mainly  an  account  of  a  joint  work  [61  with 
G.  Brown,  from  the  University  of  New  South  Wales,  and  G.  Michon, 
from  Dijon  University.  The  multifractal  formalism  is  described,  and  a 
setting  in  which  it  holds  is  given,  as  well  as  the  Michon  construction  of 
Gibbs  measures. 


1.  Introduction:  the  multifractal  formalism 


Let  -Vn  be  an  increasing  sequence  of  positive  integers.  The  interval 
is  denoted  by  In.i-  Let  p be  a  probability  measure  on  (0,  If.  Set 


(ql  =  - 


log  Vn 


-log  21’  Flln.ll*' 


Oti  i  <  V  „ 


LQ  •: 


where  means  that  the  summation  runs  over  the  indices  j  such  that 
u(ln.j)  #  0,  and  suppose  that  Tlq)  ^  lim„  T„|q)  exists  lor  every  q  in 
a  certain  interval  3  of  91. 

On  the  other  hand,  let  us  define  ln(x)  to  be  the  interval  of  the  family 
1 1  ” .1  'o ■  j .  V  which  contains  the  point  x  of  10, 1  ( ,  and  set,  for  a  0, 


E«  =  s  x  fO,  1[j  lim 


logpllnlx)) 

lOgA-n 


Then  the  muitifractal  formalism,  as  asserted  in  various  works  [11,  12, 
13],  and  proved  [1,  2,  5,  9,  17]  to  hold  in  various  contexts,  says  that  the 
Hausdorff  dimension  of  can  be  computed  in  the  following  way: 


dim  Ea  =  inf  fcxq  -  T(q)|.  (1.1) 

qc  J 

In  the  case  where  t  is  differentiable  at  a  point  qo  and  a  =  T'(qo),  we  have 
dimEc  =  aqo  -  T(qo). 

I7S 

J  S.  Byrnes  el  al.  (eds.).  Probabilistic  and  Stochastic  Methods  tn  Analysis,  tvith  Applicatiom.  1 75- 1  Kft. 

€>  1992  Kluwer  Academic  Publishers.  Printed  tn  the  Netherlands. 
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This  article  is  organized  as  follows.  In  the  next  section,  the  setting  is 
enlarged  in  order  to  deal  with  families  of  partitions  the  elements  of  which 
can  have  different  lengths.  Without  any  assumption  on  the  measure,  it  is 
shown  that  the  right  hand  side  of  ( 1 . 1 )  is  always  an  upper  bound  for  dim  E„ . 
Moreover,  the  result  is  a  bit  stronger,  in  the  sense  that  we  can  deal  with  the 
Tricot  (packing)  dimension  instead  of  Hausdorff's.  We  can  also  majorize  the 
dimension  of  a  larger  set  than  £«. 

The  third  section  is  devoted  to  getting  lower  bounds  for  dimensions 
once  the  existence  of  Gibbs  measures  is  assumed. 

In  the  fourth  section,  Michon's  proof  of  the  existence  of  Gibbs  measures 
for  homogeneous  trees  is  given. 


2.  Upper  bounds  for  dimensions 


Let  [[In.ili  $1^  ^  sequence  of  partitions  of  fO,  1 ;  by  intervals,  semi¬ 

open  to  the  right.  These  partitions  need  not  be  nested.  If  x  t  (0, 1(,  Lilxl 
stands  for  the  interval  of  the  family  which  contains  x.  The 

length  of  an  interval  I  is  denoted  by  il!-  We  suppose  that,  for  anv  x  t  lO,  1[, 

lim„_^:^  !Ir,(x)|  --  0. 

We  consider  two  indices  dim  and  Dim  which  are  defined  as  hlausdorff 
and  Tricot  dimensions  are,  but  by  only  considering  coverings  and  packings 
by  intervals  in  the  family  An  account  of  several  notions 

of  dimension  is  given  in  the  appendix. 

We  are  given  a  probability  measure  u  on  ;0. 1  \  and  a  sequence  :A„ 
of  positive  integers  such  that  X!,,  -o  'hAn  )  oc'  for  any  rj  0. 

We  define  the  following  quantities: 


and 

C(x,y|  -  lim  supC„(x,y) 

n  -  ♦ 


where  X!"  means  that  the  summation  runs  over  the  j'ssuch  that  u„(ln,i  I  *  0. 

We  suppose  that  C(x,  y )  is  not  constantly  equal  to  Oor  oo  (this  impctses 
the  growth  of  the  sequence  1A„;),  and  set  Q  ^  {(x,y)  -  tH-^jC(x,y)  v  Oj. 
Since  C  is  a  convex  function,  non-decreasing  as  a  function  of  x,  and  non¬ 
increasing  as  a  function  of  y,  there  exists  a  concave  and  non-decreasing 
function  cp  from  to  (H  such  that  the  interior  of  Q  is  identical  to  the  set 
{ (x,y )  jy  <  igjx  -  0) }.  Of  course,  taking  the  limit  to  the  left  only  mat¬ 
ters  at  the  left  end  of  the  interval  ^  on  which  (g  is  finite.  Besides,  we  assume 
that  0  c  ^  and,  for  the  sake  of  simplicity,  that  <p  is  differentiable  on  this  in¬ 
terval  (the  complete  discussion,  in  the  case  where  it  is  not  so,  is  given  in  [6]). 
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In  the  case  described  in  the  introduction,  where  all  the  intervals  of  the 
partition  {In,i}i:Sj$v„  have  the  same  length.  An  =  "Vn  and  the  limit  exists, 
we  have  (p(x)  =  t(x  +  1 ),  where  x  is  the  function  defined  in  the  introduction. 
Set  f(a)  =  inf  fa(x  +  1 )  —  (p{x)l. 

On  the  other  hand,  we  consider  the  following  sets 

B..  {x  e  10,  Kilim  sup  S«} 

Vot  =  |x  €  fo,  1  [  I  Urn  inf  5:  cx| 

I  '  log|In(x)|  J 

^/•  /  fA  ir  1 1-  log4(I„(x))  1 

V„  =  X  e  fO,  1  [  hm  sup  -r-°  ,7--  >  a  } 

I  '  log|ln(x)l  J 

We  then  have  the  following  result. 

Theorem  2.1. 

1)  For  any  a,  we  have  Dim  B*  <;'  1  and  Dim  V'  $  -cp(-l ). 

2)  If  a  ^  <p'(-l  1,  then  Dim  $  f(a)  and  dim  B^  $  f(a). 

3)  If  a  5  <p'(- 1 ),  then  Dim  V«  $  f(a)  and  dim  V*  ^  f(a!. 

Proof.  Let  us  for  instance  consider  the  second  case  (a  <  4r'(  - 1 )),  and  set 
Bp(n)  -  {t  .£  fO,  If  !nll„(tll  ->  |l„(t)i‘'} 

We  then  have 

=  n  u  n 

a  •-  w>'  .-It  tn  »i  •  tn 

Fix  a  <  |3  <  ip'  (  1 1  and  6  >  f  ( |3 ),  and  choose  t  >  0  such  that  C I  -  1  -  t ,  -  &  + 


(3t)  <  0.  Then 

L 

6  1  ' 

i  n(  l„ 

.  i'  -iIh.i!' 

idem 

Therefore 

L 

L 

^u(i„.,)'ii-,.ir' 

1 

<  exp  A„C„(  -  1  4  t, 

ILi.jl*  <  oo. 

-6  (  ) 

-6  +  |.3t) 

”  i  nM„  ,1'' 

So,  if  ;ljl  is  a  packing  of  On  Bpfnl  by  intervals  from  generations  larger 
than  m,  we  have  <  °o-  Therefore  A  Bp(n))  i;  6  (cf.  ap¬ 

pendix)  and  Dim  B,*  ^  6.  Finally  DimBa  ^  f(al. 

The  other  cases  are  handled  in  a  similar  way.  | 
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2.1.  An  alternate  definition  of  cp 

Consider  the  following  quantity: 

K(x,y)  =  lim  sup  ^ 

£-packing 

‘  p  function  K  is  convex,  so  the  set  Q  =  {K  =  0]  is  also  convex.  More¬ 
over,  if  K{a,  b)  is  finite,  then  K(a+t,y— u)  =  Ofor  positive  t  and  u.  Therefore, 
there  exists  a  concave  and  non-decreasing  function  (p  from  91  to  51  such  that 

■{  (x,y)|y  <  (p(x  -  0)}. 

As  previously,  set  f(a)  =  infx  (a(x  +  1)  -  <p(x)).  In  these  conditions, 
we  have  the  following  result. 

Theorem  2.2. 

1)  If  a  <  (p'(-l  I,  then  Dim  B„  ^  f(a). 

2)  If  a  >  (p'(-l ),  then  Dim  V„  ^  f(a). 


Proof.  In  the  first  case  (a  <  <p'(-l )),  if  6  >  f(a)  the  straight  half  line  of  slope 

O 

a  stemming  from  the  point  (-1,-6)  intersects  O.  In  other  terms,  there  exists 
a  positive  number  t  such  that  (-1  -I- 1,  -6-1-  at)  €  D.  There  exists  £  >  0  such 
that,  for  any  e-packing  { Ij } of  fO,  1  [  by  elements  of  the  family  { ln,j  },^  j 

we  have  pd,)*'  'lliT'''  $  >■ 

As  in  the  preceding  section,  we  write 

b;  =  n  n  u  Bp(n). 

»»  fl‘  ((>'1  -  1  I  m  >  I  II  rim 

Therefore,  if  n  is  such  that  supj  |In,j|  $  £,  if  a  <  (3  <  (p'(-l),  and  if 
{ Ij } .  is  an  e-packing  of  the  set  Bp(n),  we  have 


So,  Dim  Bp(n)  ^  6,  and  Dim  B^  ^  f(a). 

The  .second  case  is  handled  in  the  same  way. 


Remark  2.3.  We  could  also  have  defined  K(x,y)  to  be: 
lim  inf  ^  pdj)*  "lliP” 

where  the  inf  is  taken  over  the  c-coverings  (Ijl  of  [0, 1[  by  elements  of  the 
family  {In.il,,  |  •  The  function  K  may  be  no  longer  convex,  but  the  boundary 
of  { K  =  0 }  is  still  defined  by  a  non-decrea.sing  function  <p  from  91  to  91.  Iff  is 
defined  as  above,  then  a  similar  conclusion  holds  by  replacing  Dim  by  dim. 
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3.  Lower  bounds  for  dimensions 

The  notations  are  the  same  as  in  the  previous  section.  If  u  and  v  are  two 
functions,  the  relation  u  w  v  means  that  there  exists  a  positive  constant  K 
such  that  K'^'u  $  V  $  Ku. 

Theorem  3.1.  Let  0  e  91  and  suppose  that  <p'(0)  exists  and  that  there  is 
a  measure  ne  such  that  (j.0(ln,i)  «  h(In.i)® '  ’  lln,)!”'*’*®’-  Then  we  have 
dimE,p.,0)  =  f(<p'(0)). 

The  measure  ue/  in  analogy  with  statistical  mechanics,  is  called  a  Gibbs 
measure. 

Proof.  Consider  the  following  quantities 

Cn(x,y)  =  log  ^  p(In,j)’‘|ln.jr'^t.l0(G.,i) 

An  ^ 

and 

C(x,y)  =  lim  sup  Cn(x,y |. 

We  have 

C„(x,y  I  =  '‘■'’8  yll.T.i)’''*’' -toll) 

At,  ^ — 

and 

C(x,y)  <  0  «  C(x  +  0.y  +  <p(0))  0. 

Therefore 

|c<o|  =  I  (x,y)|y  (plx  +  0)  -  4>(0U  . 

I 

Lemma  3.2.  As  n  goes  to  infinity,  >  v'(0)  for  ue-almost  every  1. 

Proof.  If  a  <  (p'(0)  then  there  exists  t  >  Osuch  that  C(t,atl  <  0.  Then 

y0  {t|y(In(t))  >  |ln(t)r  }  --=  Y_  ttedn.i) 

i:M(  In  I  r 

-  51lIn.jl“'!In.ir“'y0(In.i) 

idom 

$  ^y(In,j)'|I„.jr“’y0(I...i) 

'} 

=  expAnCn(t,at). 
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(this  inequality  is  a  large  deviation  type  result  [7],  as  well  as  the  analogous 
one  in  the  proof  of  the  theorem  concerning  upper  bounds).  Therefore 


Y_  1^0 

n 


l0gp(ln(t))  1 

10g|ln(t)|  I 


<  oo 


so,  lim  inf  ^  afor  ne-almostt.  The  upper  limit  is  treated  similarly. 

We  can  now  complete  the  proof  of  the  theorem.  It  results  from  the 
above  lemma  first  that  H0(E„,-i0i)  =  1  and  secondly,  taking  into  account  the 
properties  of  na,  that 


log  tr0(ln(t)) 
log|I„(t)| 


(0  +  n(p'(e)-(p(e) 


for  pa-almost  t.  Therefore,  due  to  the  Billingsley-Kinney-Pitcher  theorem, 
we  have  dim  E^,,.(0)  ^  f((p'(0)).  The  equality  then  results  from  the  previous 
section.  | 

As  a  consequence,  the  Hausdorff  or  Tricot  dimensions  of  all  the  sets 
f  a,  B„,  Va,  BJ,,  and  V*  are  equal  to  f(a)  under  the  same  conditions.  This 
generalizes  some  results  of  Besicovitch  [3],  Eggleston  [10],  and  Volkman  [28] 
on  the  dimension  of  sets  defined  in  terms  of  frequency  of  digits.  This  also 
accounts  for  some  results  in  [9]  and  some  work  on  'cookie-cutters'  [1,  .3]. 


4.  Existence  of  Gibbs  measures 

In  this  section  we  suppose  that  the  sequence  ]  [In. ilix,  |  of  partitions 
has  the  following  properties:  each  element  of  the  (n  +  1)-st  partition  is 
contained  in  one  element  of  the  n-th  one,  and  each  element  of  the  n-th 
partition  is  split  into  a  fixed  number  p  of  elements  of  the  (n  t  1  )-st  one. 
Obviously,  this  imposes  "v,,  -  ".  We  are  going  to  use  another  indexation 

of  the  intervals  (Injl:  the  .ervals  [I^.jl  will  be  denoted  by  li,,,,  with 
0  $  li  <  P,  in  such  a  way  that  c  f,;  and  so  on. 

Let  A  be  the  set  of  words  over  the  alphabet  [0, p  -  1].  The 
concatenation,  just  denoted  by  juxtaposition,  endows  A  with  a  semigroup 
structure.  The  empty  word,  which  is  the  unit,  is  denoted  by  c.  The  set 
of  words  of  length  n  is  denoted  by  A„;  it  indexes  the  elements  of  the  n-th 
partition.  If  a  c  A,  instead  of  writing  u(I<, )  we  shall  simply  write  uIq).  In 
these  conditions,  for  every  a  t  A,  we  have  ulub)  ~  p(a). 

We  suppose  that  p  is  quasi-Bernoulli,  i.e.,  there  exists  a  positive  number 
M  such  that,  for  any  a  and  b  in  A,  we  have  M  'p(a)u(h|  <  u(ab)  $ 
Mp(Q)p(b), 

We  also  define  a  mapping  I  from  A  to  (B:  1(a)  |1„|.  We  assume  that  I 

is  almost  multiplicative,  i.e.,  there  exists  a  positive  constant  L  such  that,  for 
any  a  and  b  in  A,  we  have  I  ''  l(a)l(b)  <  l(ab)  s  I  l(a)l(b). 
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Under  these  conditions,  G.  Michon  [21, 22]  proved  that  the  'free  energy' 
exists  and  that  there  are  Gibbs  measures.  We  are  going  now  to  give  his  proof. 

Proposition  4.1.  For  every  x  and  y  in  91,  the  ratio 

l(a)-V(a)’‘" 

o€.4„ 

has  a  limit,  denoted  by  C(x,y ),  as  n  goes  to  infinity. 

Proof.  By  replacing  I  by  1“^  y’‘,  it  is  enough  to  consider  the  case  x  =  0,  y  = 
-1.  Set  " 


Zn  =  Y-  ,and  Cn  =  -log(Z„). 


We  have 


Zm  tn  =  Y  Y 


a€.'l 


bC.'l,, 


Uab)  u(ab) 
l(a)l(b)  n(a)n(b) 


l(bHi(b). 


Therefore,  we  have  llogZ.n,,,  -logZ,„  logZ„|  log(ML).  It  results  that 
C„  has  a  limit  C  as  n  goes  to  infinity.  Moreover,  we  have  C„  -  C  < 
!  log  ML.  I 

Let  us  notice  that  if  we  set  l<,(b)  -  l(ab),  1|g),  (for  o  and  b  in  ./I),  we 
have  I  (b)L.  (cl  ^  to  (be)  y  1 ‘■’l„(b)l„(c)  for  u,  b,  and  c  in  /t.  Similarly 
for  u. 

For  any  a  in  yt,  and  .s  in  iB,  set  l.,(b)  u„(b),  and 


Z<,1S)  ^  oZnC  (4,1) 

II  -0 

It  results  from  the  above  remark  that,  for  any  n  and  for  anv  a,  we  have 
K"  'iZ„  $  nZ„  e;  KiZn,  vvith  K  -  1  M.  Therefore  lim„  , ^  Mogi.Z,, 
does  not  depend  on  a  and  is  equal  to  what  we  called  C  in  the  proof  of  the 
above  propo-sition.  Moreover,  [ log„Z„  Cj  y  logKand  K  ^expnC  S; 
<,Zn  S:  K‘  expnC  for  any  n.  So  the  series  (4.1 )  converges  for  .s  >  C.  From 
these  last  inequalities  it  results  also  that 

"  .-Zo(s).  -  - 

1  exp(C  .s)  I  explC  -  s) 


Theorem  4.2.  For  every  x  and  y  in  91,  there  exist  a  constant  c  and  a  measure 
such  that,  for  any  a  in  yt„,  we  have 


'L(q|-  ''  u(a) 


.  I  1 


-  n  (. 


''  m.u(a)  ^  c  Ual  m(ci) 


X  »  I 


n  C'<  V  ,\t  1 
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Proof.  As  previously,  it  is  enough  to  consider  the  case  x  -  0,  y  1 . 

Let  us  denote  by  l„  the  following  mapping  from  fO,  1|  to  tH:  l„  It  i 
[I„(t)j.  Let  us  define  a  family  of  functions  from  [0, 1]  to  'Jt  *  in  the  following 
way  cps  =  lo  +  li  +  •.  Obviously,  we  have  J  tps  dy  "  Z,  (s  I.  This 

allows  us  to  define  the  family  ^  u  (s  >  C)  of  probabilit\  measures 
on  |0,  li. 

If  a  t  A,,,  we  denote  by  Qj  (0  §  i  ^  n)  the  word  formed  bv  the  i  first 
letters  of  q.  In  these  conditions,  we  have,  denoting  PJL,  i  simple  by  Pjo  L 

(s)  P,s(a)  --  y(Q)  ^  1(q,|c  ^  Uul’!u!Gh,c  ' ■■ 

i'-  )•  n  \  C  b-  'i 


In  other  terms 


PJu) 


U  ( Q  I 
Z.  Is ) 


Y_  llu,!c 


llci'l  lilt]  I  c 


Z.,isl 
Z.  isf 


When  s  goes  tt'  utfinity,  P^  has  a  weak  limit  point  >  at  least  iful,  \se 
know  that,  as  s  goes  to  infinity,  so  does  Z.  Is)  and  that  the  ratio  '  s :  /,  ,  < 
stays  between  K  '  and  K  This  means  that  we  ha\e,  It'r  o  -I,, , 


K 


I 


e  ta  I 
lIcilufuA’ 


K-’. 


Remark  4.3.  The  case  t)f  Kies/  products  |8l  is  not  handled  b\  this  prooi 


5.  Example 

One  of  the  paradigms  ot  mutifractalitv  is  the  multinomial  measuivs  of  which 
we  give  a  generali/atitm  in  this  section.. 

Let  X  be  the  simplex  Jxi . x,, )  xi  •  •  x,.  Lx,  ■  P  for 

i  -  1 . r  ,(p  ,■  2).  Consider  a  se<.|uence  im,, .  l„  of  elements  of  X  X. 

We  assume  that  this  seijuence  has  a  continuous  measure  of  repartition  L. 
This  means  that  there  exists  a  continuous  probability  measure  c  on  the  space 
X  ■  X  such  that,  for  any  open  set  U  X  -  X  the  boundars' of  which  is  of  zero 
L-measure,  we  have 

Jim  1 1  '  n  (m,.  [,  I  ►  LI }  LI  111. 

Moreover  we  assume  that  the  boundary  of  X  •  X  is  of  zero  L-rneasure 

As  in  the  section  concerning  the  construction  of  Gibbs  measures, 
we  consider  subintervals  ot  [0.  L'  indexed  bv  words  over  the  alphabi't 
' . P  -  '  :  I.  [0. 1  (,  and  the  length  of  U,  v„ . ,  is  L, .  i.^.  . ,  lU,  x.  ■ 
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Define  a  measure  |x  on  10, 1  f  in  the  following  way: 

. X  J  =  n  mj.x,. 


It  can  be  verified  that  we  have 

lim  -log^nlU, . x„)’‘"|Ix . x„r 

n-*cc  n 


log  Y. 

JxyX 


(u^"v^«)  dyu.v). 


So  we  have  an  explicit  expression  for  C|x,y).  On  the  other  hand,  if 
wi’  keep  the  same  U,  but  if  mn  is  replaced  by  ^ i  -  to 

exponentiate  a  vector  means  to  exponentiate  each  of  its  components)  and 
perform  a  similar  construction,  we  get  a  measure  which  is  the  Gibbs 
measure  corresponding  to  (x,  y).  Thereforein  this  situation  the  multifractal 
formalism  holds. 


6.  Appendix:  Hausdorff  and  Tricot  dimensions 


6.1.  Hausdorff  dimension 

Let  E  be  a  subset  of  [0. 1 '  and  a  a  positive  number.  Set 

Hc(f  I  -X  lim  inf  ([,1“  'E  [Jlj,  ll,!  ‘■'j  • 

If  H„  (E  I  <  00  then  |3  a  ^  HplE)  -  0.  So  there  is  a  cutoff  ixo  such  that 

a  <  ac  ^  Ha(E)  -  oo  and  a  >  oo  -s  H„ll  )  =  0, 

This  number  ae  is,  by  definition,  the  Hausdorff  dimension  of  f. 

Another  dimensional  index  is  of  wide  use.  Let  N,  (E I  be  the  minimum 
number  of  elements  of  ciwerir.gs  of  L  by  intervals  of  lengths  less  than  t, 
and  set 


Alt) 


lim  sup 

,  -.0 


logN,  ([.  I 
log  t 


This  index  has  been  considered  by  many  authors  and  bears  .several 
names:  Bouligand-Minkowski  dimension,  entropy  dimension,  logarithmic 
index,  box  dimension  ...  In  fact  these  indices  differ  in  a  general  metric 
space.  Obviously  we  have  dim  E  ^  AIL). 

The  following  observation  gives  a  way  of  getting  a  lower  bound  for 
the  Hausdorff  dimension:  if  there  exists  a  measure  u  satisfying  a  Holder 
condition  of  order  a  (i.e.,  nil  I  $  Ctl|“  for  every  interval  1)  and  such  that 
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n(E)  >  0,  then  dimE  $  a.  Indeed  if  ilj)  is  a  covering  of  E  by  intervals, 
we  have 

o<  n(E)  ^  $  c^ii^r 

wich  proves  the  above  assertion.  In  fact  a  refinement  of  this  argument  give; 
the  following  lemma  due  to  Kinney  and  Pitcher  and,  in  a  more  general  form, 
to  Billingsley  [4]. 

Lemma  6.1.  Let  n  be  a  probability  measure.  If 

H(E)  >  0  and  E  I  Itllim  inf  5:  otl 

I  i\;t;  log![|  J 

then  dim  E  >  a. 


6.2.  Tricot  dimension 


An  f-packing  of  E  is  a  collection  of  mutually  disjoint  intervals  intersecting 
E.  The  following  property  of  box  dimension  can  be  found  in  [26]: 


A(E) 

-infja  :  lim^sup  |Iji''  i  ;I, [being  an  t -packing  of  t ,  ^0 


-  sup  I  a 

One  drawback  of  the  bc>x  dimension  is  that  it  does  not  distinguish  a  set  from 
its  closure.  For  instance,  it  assigns  0  to  the  dimension  of  the  rationals  This 
led  Tricot  [26, 2,^]  to  introduce  the  following  concept; 


lim  sup  [Y.  ilii"  :  being  an  c-packing  of  1  [  .x- 


[ 


L)im  I  -  inf  | sup  All  „  id 


U'.} 


Obv  iously  dim  1  \  Him  F  A  related  notion  has  been  introduced  b\  Sulliv  an 
[24].  An  account  vif  this  notion  of  dimension  and  of  connected  outer  measures 
can  be  found  in  [27] 

The  index  Dim  has  the  same  regularity  properties  as  Hausdorff  dimen¬ 
sion:  if  F  "  E  then  Dim  F  C  Dim  F,  and,  if  F  is  the  union  of  a  sequence  E  „ 
of  sets,  then  Dim  1  sup,,  Him  1 


7.  Final  remarks 

There  are  many  developments  which  we  arc  not  going  to  discuss  about 
multifractals,  in  particular  concerning  further  interpretations  [  l.'v,  16,  18,  l^F, 
20]  of  the  (unction  f(a|  especially  when  it  assumes  negative  values. 

In  a  recent  work  Muzy  et  al.  [23]  have  adapted  this  formalism  to  handle 
another  situation  by  replacing  indicator  functions  of  intervals  by  wavelets. 

Finally,  the  thermodynamical  formalism  has  been  used  to  study  har¬ 
monic  measures  (14). 
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Applications  of  Gabor  and  wavelet  expansions 
to  the  Radon  transformf 
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S  UV  iiut'Stigate  the  relationship  In’lween  the  Radon  tran''lorni  and  cer¬ 
tain  phase  space  locali/atii'ii  functions,  nanielv  the  continuous  Uabor  and 
wavelet  transforms.  We  derive  inversion  formulas  for  the  Radon  transform 
based  on  the  Gabor  and  wavelet  transform.  Some  of  these  formulas  ^;ive  a 
direct  reci'iistrucfion  of  r  I'r  t'f  ‘  t  from  the  Radon  translbrni  data.  Dthcr.^ 
show  how  the  Gabor  and  wavelet  transforms  of  t  or  '  i  can  be  recovered 
directs  from  the  Radon  transform  data.  We  su>;j;est  wac---  m  whiih  thc'c 
formulas  can  lead  to  efticient  reconstruction  algorithms  .ind  can  be  applied 
to  luiise  reduction  in  reconstructed  images. 


1.  Introduction 

The  R.id<i7i  fraiis/onti  is  a  mathematical  tool  which  is  used  to  describe  an 
image  (which  mav  be  thought  of  as  a  function  of  several,  typically  two,  \  ari- 
ables)  in  terms  of  intensitv  averages  over  lines  or  hvperplanes  in  several  di¬ 
rections  Tvpically  such  ac  eragescan  be  easily  measured  vc  hile  the  function 
itself  is  inaccessible.  In  computerized  tomographv  (CT)  scanners,  lor  exam¬ 
ple,  one  wishes  to  determine  the  tissue  density  function  in  a  cross-section  o( 
the  hunian  body  from  non-invasive  measurements.  The  basic  problem  is  the 
accurate  recovery  of  the  unknown  function  or  at  least  relevant  features  of 
the  unknown  function  in  a  stable  fashion  and  requiring  the  fewest  possible 
measurements.  In  addition  to  medical  applications,  the  Radon  transform 
has  also  been  used  in  astronomy',  electron  micro.scopy,  optics,  geophysics  1 7). 
The  Radon  transform  has  recently  been  proposed  as  the  basis  of  a  recovery 
instrument  for  space  plasmas,  and  in  determining  the  chemical  composition 
of  flames  [23|. 

t  TIms  p.ipcr  .1  report  oi  joint  work  bt'inj;  iimlorUkon  by  the  diithor  together  with  C\^rlos 
Berenstein  of  the  University  MarylanJ. 

IK7 

J  .S'.  Hyrtu's  et  al.  ).  ProUthilistu  and  Stochastu'  Methods  in  Analysis,  with  Applu  alions .  1  S7-2l).S. 
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Given  a  function  f  defined  on  tR'*,  its  Radon  transform,  Rf  consists 
of  the  average  of  f  over  all  hyperplanes  in  tH**.  For  example,  the  Radon 
transform  in  the  plane  (d  =  2)  would  consist  of  the  integrals  over  all  lines 
of  a  function  defined  in  the  plane.  In  planar  imaging,  these  averages  can 
be  found  by  measuring  the  attenuation  of  a  beam  passing  through  a  two 
dimensional  slice  of  the  body.  In  some  applications,  one  also  needs  to 
consider  integration  over  k-planes  in  For  instance,  when  d  =  3,  NMR 
scanners  can  be  modelled  in  terms  of  the  hyperplane  Radon  transform,  but 
emission  tomography  leads  naturally  to  integration  on  straight  lines.  This  is 
usually  called  the  X-ray  transform.  For  simplicity,  the  rest  of  the  discussion 
in  this  paper  will  be  about  the  hyperplane  transform.  In  this  case,  the 
hyperplane  averages  of  f  are  organized  as  follows. 

Refis)  =  f(sO  +  y)dy 
where  0  €  S‘'~'  and  s  t  ‘Tt 

The  adjoint  of  the  Radon  transform  is  commonly  referred  to  as  the 
backprojection  operator  and  is  defined  as  follows.  For  a  function  h  defined 
on  S*'-'  71, 

R*h(x)=J  h(0,x-0)d0 
with  X 

In  even  dimensions,  the  Radon  transform  is  non-local  in  the  sense 
that  the  recovery  of  f(x)  requires  knowledge  of  the  integrals  of  f  over  all 
hyperplanes.  By  contrast  in  odd  dimensions  recovery  of  f(x)  requires  only 
the  integrals  of  f  over  hyperplanes  passing  through  a  neighborhood  of  x.  This 
is  an  important  consideration  in  medical  imaging  as  one  wants  to  expose  the 
patient  to  as  little  radiation  as  possible. 

An  approach  which  tries  to  preserve  locality  in  even  dimensions  has 
recently  been  proposed  in  [9].  This  involves  the  recovery  of  Af  where 
A  =  l/27t(-A)'  ^  where  A  is  the  Laplacian.  In  even  dimensions,  it  is  pos¬ 
sible  to  recover  Af(x)  from  integrals  of  f  on  hyperplanes  passing  through 
a  neighborhood  of  x.  Since  A  acts  as  a  differentiation  operator,  the  image 
Af  tends  to  highlight  edges  in  f(x),  i.e.,  regions  of  sharp  changes  in  tissue 
density,  and  also  to  reveal  more  clearly  details  such  as  small  blood  vessels. 
This  approach  is  known  as  local  tomography  or  Lambda  toinography  [18, 9]. 

The  Gabor  transform,  a  variant  of  which  is  known  as  the  short-time 
Fourier  transform,  was  introduced  in  1946  by  D.  Gabor  [13]  as  a  tool  in 
communication  theory.  It  and  its  variants  have  long  been  used  by  engineers 
in  digital  signal  processing  applications.  More  recently,  the  Gabor  transform 
has  been  used  as  a  tool  in  image  analysis,  compression,  and  segmentation 
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[6,  22].  The  transform  compares  a  given  signal  to  shifts  and  modulations  of 
a  fixed  window  function,  g,  that  is,  to 

gx,c(t)  (1.1) 

In  this  way,  the  transform  gives  a  time-frequency  picture  of  the  signal.  Gabor 
used  a  Gaussian  as  a  window  in  order  to  achieve  the  best  possible  joint 
localization  in  time  and  frequency.  With  the  short-time  Fourier  transform,  a 
box  function  is  used  as  a  window.  Discrete  versions  of  the  Gabor  transform, 
known  as  frames  [8],  exist  which  permit  stable  and  efficient  expansions  and 
reconstruction  of  functions  [4].  However,  such  developments  are  necessarily 
overcomplete  [1,2].  Recently  an  orthonormal  basis  closely  related  to  fiames 
of  Gabor  functions  has  been  discovered  [5].  Such  bases  are  known  as  Wilson 
bases  and  consist  of  linear  combinations  of  pairs  of  Gabor  functions.  The 
Wilson  basis  functions  are  real  valued,  and  their  close  relation  to  Gabor 
functions  permits  easy  computation. 

The  wavelet  transform,  introduced  in  [16]  has  been  an  increasingly 
popular  tool  for  signal  and  image  analysis.  The  transfoi  nt  compares  a  signal 
to  shifts  and  dilates  of  a  fixed  function,  the  mother  wavelet  v|j,  that  is,  to 

-  b)/Q)  (1.2) 

with  a  €  and  b  6  gi‘‘.  As  a  time-frequency  localization  operator,  the 
wavelet  transform  is  fundamentally  different  from  the  Gabor  transform.  By 
using  dilations  the  wavelet  transform  can  achieve  arbitrary  fine  time  local¬ 
ization  while  still  giving  a  complete  representation  of  the  signal.  Remarkable 
wavelet  orthonormal  bases  have  been  constructed  consisting  of  smooth  and 
rapidly  decaying  (even  compactly  supported)  functions.  The  expansion  and 
reconstruction  of  a  signal  in  such  a  basis  is  very  efficient  numerically,  and  in 
fact  is  faster  than  the  FFT. 

In  this  paper,  we  investigate  some  of  the  connections  between  the 
Gabor  and  wavelet  transforms  and  the  Radon  transform.  We  will  derive 
inversion  formulas  for  the  Radon  transform  based  on  the  Gabor  and  wavelet 
transforms.  One  type  is  a  direct  inversion  formula  based  on  the  development 
of  Ref  (s)  for  each  t)  in  a  .series  of  the  form 

Refis)  =  Ci.,M,(B)h„,,„(.s)  (1.3) 

n  m 

where  the  hn  ,m  can  be  a  collection  of  Gabor  functions,  a  Wilson  basis,  or 
a  wavelet  basis.  In  the  case  of  Gabor  or  Wilson  functions,  the  advantage 
lies  in  the  fact  that  the  basis  functions  are  known  explicitly  so  there  are  no 
problems  of  interpolation  in  the  reconstruction  scheme.  In  the  wavelet  case, 
the  basis  functions  do  not  in  general  admit  a  closed  form  analytic  expres¬ 
sion,  so  numerical  approximations  must  be  used.  In  this  case,  however,  the 
computation  of  the  coefficients  is  extremely  fast. 
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Another  type  of  inversion  formula  presented  here  is  based  on  the 
method  of  filtered  backprojection  which  is  essentially  an  implementation 
of  the  formula  (see  Proposition  2.4) 

K  >  f  =  R^lk  *  Rfj  il.ll 

where  R'k  =  K  [19],  The  idea  is  to  compute  (1.4)  for  functions  K  which 
approximate  the  6-function.  Such  K  are  often  referred  to  as  "point-spread 
functions."  Since  both  the  Gabor  and  wavelet  transforms  can  be  realized  as 
convolution  operators,  it  seems  natural  to  ask  whether  one  can  recover  these 
transforms  directly  from  the  Radon  transform  data.  In  the  Gabor  case,  the 
kernels  K  are  modulated  Gaussians  and  in  the  wavelet  case  are  dilates  of  a 
fixed  mother  wavelet.  In  Sections  2  and  3,  formulas  are  derived  which  in 
some  cases  allow  the  recovery  of  the  Gabor  and  wavelet  transforms  directly 
from  the  one-dimensional  transforms  applied  to  the  data  Ro  f  for  each  d. 

In  both  cases,  the  formulas  arc  local  in  odd  dimensions  and  in  even 
dimensions  Af  can  be  recovered  in  a  local  fashion.  Also,  the  formulas  allow 
the  selective  recovery  of  t  or  Af  at  certain  frequencies  (the  Gabor  case)  or  at 
certain  resolutions  (the  wavelet  case).  This  feature  can  be  useful  in  the  noise 
reduction  of  tomographic  images  |21 1. 

In  this  paper  we  use  the  following  notations.  We  denote  by  '2P‘,  d 
dimensional  Euclidean  space  and  by  '.){*'  its  dual  space.  The  I'ourier  trans¬ 
form  in  fH‘'  is  defined  by  f(i,l  ^  c^'"^' '  ((x)  dx  whenever  i  is  integrable 

and  as  an  appropriate  limit  when  it  is  not.  We  denote  by  SrdPM  the  space 
of  infinitely  differentiable  functions  which,  with  all  of  their  derivatu  es,  de¬ 
cay  faster  than  any  polynomial.  This  space  is  commonly  refered  to  as  the 
Schwartz  space. 

We  begin  with  a  review  of  the  definition  and  some  basic  properties  of 
the  Radon  transform 

2.  The  Radon  transform 

2.1.  Definitions  and  preliminaries 

Definition  2.1.  Given  f  c  Sl'lf*' ),  we  define  the  Radon  fransfonn,  Rt  of  f  bv 

Rf(t),s)  ~  Ref(s)  -  I  f(sO  i  vtldi) 

.‘a* 

where  0  e  S‘'  ',  .s  c  fH. 

Definition  2.2.  Given  h  a  bounded  continuous  function  on  fK,  we  define  for 
each  0  ■'  the  operator  R^  by 

R*0h(xl  -  h(x-0). 
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For  h  bounded  on  S‘'  '  v  we  define  the  operator  R"  by 


R*h(x) 


h(0,x  ■  0). 

Sd  ' 


Note  that  given  f  €  and  h  e  S(01),  we  have  that 


S‘‘-' 


Ret'ls)  h(s)  ds  = 


f(s0  i  y )  dy  h(s)  ds 


or 


5?  J0J- 


=  f(xj  h(x  •  0)  dx  =  [  t(x)  Rghlx)  dx. 

. «  Joni 

Also,  for  f  €  S('itt‘'),  and  h  €  S(S‘'^'  ■  !H),  integrating  the  above  over 
gives 


sJ 


Rf(0,s)h(0,s)dsd0  f(x)R''h(x)dx. 

':h 


In  this  sense,  R^  and  R"  are  the  formal  adjoints  of  Ra  and  R. 

We  now  collect  some  basic  properties  of  the  Radon  transform  whose 
proofs  can  be  found  in  any  standard  text  on  the  subject,  e.g.,  1 10]. 


Proposition  2.3.  Let  f,  y  t  Then  fora.e.,  0  and  s. 


Ra(f  ‘  cjl(s)  Rof  »  Rag(.s) 

where  the  convolution  on  the  left  is  in  and  that  on  the  right  is  in  t'H 
Proof.  Suppose  that  f.  g  k  S('Jb*). 


Relf  •  cills)  --- 


fit  I  a(.s0  +  y  —  t )  dt  dy 

tji  .  an'' 


f(T0  4  t'l 


ReflxlRayls 


4) 


yds  -  t)0  t  y  t'l  dy  dt'  dr 
'oj 

t)  dx  Ref  *  Rey(s). 


■ 


Proposition  2.4.  l,et  f  t  S(0i‘'  I,  y  e  I  ^  (0t).  Then  for  each  0  c  S'*  ’, 

(Reg)*f=-Re(0*ReO 

where  the  convolution  on  the  left  is  in  and  that  on  the  right  is  in  fH. 
If  g  (.  1  -  (S''“'  ■  01),  then 


R'g  ♦  f  -  R*(g  ♦  Rf). 


(2.2) 
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Proof.  Assume  that  f  €  S(iH‘')-  For  0  €  S*^  ’  fixed,  let  t  =  t0  +  t'  and 
X  =  s0  +  x'.  Then 


g(t  ■  0)  f(x  —  t)  dt  =  g(T)f((s  —  t)0  +  (x' —  t'))  dxdt' 

s>i '  Jajt  JiH 


g(T)  f((s  -  t)0  +  (x' -  t'])  dt' dx 
. «  Jej- 


=  g(x)R0f(s  -  x)  ds  = 

.  iH 


g  *  Ref(x  •  0). 


This  proves  (2.1].  Integrating  the  above  formula  over  0  t  S‘'”'  gives 

(2.2)  forf  t  I 

2.2.  Inversion  of  the  Radon  transform 

Proposition  2.5  (The  Fourier  Slice  Theorem).  Let  i  i  S(0i‘M  Then  lor 

e  € 

(R9f)'^(Y)  -  f(Yd).  (2.3) 


Proof. 


Ref(s)c  =  f(.s0  +y)c-‘''‘''>'dy  d-s 


=  f(x)e-‘'’‘'^>'’'dx  .--((Yd). 


Coroliary  to  2.5  (Fourier  inversion),  l.et  f  t  S(iH‘' ).  Then 
t(x)  =  [  [  (Ref)'^(r)c‘'’’‘'‘’'"|r|‘‘ ■' drde. 

J-s]  '  J  ^ 

where  S‘)“  ’  denotes  the  upper-half  sphere  in  in'*. 

Proof.  Writing  the  standard  Fourier  inversion  formula 

f(xl  =  f(£,)c^'”''^'d£. 

in  polar  coordinates  and  using  (2.3)  gives 


f(x)  = 


(R0f)'^(T)e'^’'‘'’‘®"'r‘'-'  drd0 

^  '  Jo 

•  rCC 

(Ref)'^(r)e^’’‘'’‘  ®"'|rr'“'  dr  d0 

Js‘!  '  J-oc 
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since  (R-0t)'^(r)  =  (R0f)'^(-r).  | 

Definition  2.6.  Let  t  6  SISH**),  a  6  iH.  Then  we  define  the  Riesz  potential 
operator,  1“  by  (I“f)^(£,)  =  |£,i““f(L). 

If  «  =  -2,  then  1“  =  -I/(27r)'^A,  and  A  is  the  Laplacian.  If  tx  =  — 1, 
we  refer  to  I~’  as  the  Lambda  operator,  A,  which  is  important  in  local 
tomography.  Note  that  Af  =  l/2rt:  (-A)'/^f,  see  (9, 18, 19, 7], 

Proposition  2.7.  Let  f  €  S(5H‘'),  tx  <  n.  Then 

f  =  lr“R’'l“"-‘'Rf.  (2  5) 


Proof.  See  [19],  | 

Corollary  to  2.7.  If  a  =  0  then 
1)  if  d  is  even. 


f  = 


1 


2(2m)‘'-^ 
2)  if  d  is  odd, 

1 


f  = 


2(2Tci) 


d-l 


R*  Hai'-'Raf 


R'a^-'Ref 


where  H  is  the  Hilbert  transform  on  91,  i.e.,  for  y  fH, 

(Hf)'^(Y)  =  -^a(y)f(Y|  (2.6) 

Zm 

(a  is  the  signum  function)  and  as  means  differentiation  with  respect  to  s. 

If  a  =  -  I  and  d  is  even. 


Af 


I-'f  = 


I 

2(Z7rtF 


R^a^'Raf. 


(2.7) 


Proof.  For  d  even,  d 

I'-'Ms)  f 


-  1  is  odd,  so  for  any  h  6  5(91), 
lyl^'^'  ds 


=  1/(2m)‘'  ^  I  a(y)/27Ti(2my)‘'^' h(y)e^’’‘>'"  ds 
Jot 

=  1/(2m)‘‘-^Hd^-’H(s). 

Taking  h(s)  =  Raffs)  gives  1)  and  2). 
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Similarly  for  d  odd, 

l'-'‘h(s)  =  1/(27ti)‘*-'6f-’h(s). 

Taking  h(s)  =  Reffs)  gives  (2.7).  I 

3.  Radon  transform  inversion  based  on  the  Gabor  transform 

3.1.  Background  and  preliminaries  on  the  Gabor  transform 

Definition  3.1.  We  define  the  following  operators.  Given  a  function  h 
defined  on 

1)  E,,h(t)  =  >h(t),  y 

2)  Txh(t)  =  h.(t  —  x),  X  €  51*'. 

3)  D„h(t)  =  hjt)  =u-‘'  ^h(t/Q).  a  ,>  0. 

Given  f  tE  S(51*' ),  and  ha  real-valued  even  function  on  51*'  with  h  !  j  - 
1,  we  define  for  x  t  51*'  and  L  h  51*', 

h\,i,(t)  ----  h|i  -  x|. 

The  Gabor  transform  of  f  with  respect  to  h  is  defined  bv 

M'''''(h;f)(x,i,)  =  fly )  hx.Gu)  dy  -- E  th  •  f(x).  (3.1) 

.  « '■ 

In  what  follows  we  will  in  general  lake  the  analyzing  function  h  to 
be  the  Gaussian  c>r  a  scaled  version  of  the  Gaussian.  Therefore  we  define 
y(x)  for  x  ►  *51*'.  We  use  the  .same  notation  for  the  Gaussian  in 

different  dimensions.  The  dimension  will  always  be  clear  from  the  context. 
The  following  facts  are  readily  proved  (4,  17], 

Proposition  3,2. 

*'  (h;f|(x,f.).-' dxdd  Tltli^'dt.  (.1.2) 

iH  'i  J'.'o''  J-n-' 

'l''‘''(h:f)(x,f.)  -  3''*''(h:f)(x'.f,')c‘''’"'-  ' 

'.■>01  J-n-' 

't''‘''(h;h)(x  x',L~  L']dx'  dL'.  1.1..1) 

f(t)  M''*''(hi;f)(x,f,)Kx,,  (t)dxdf,.  (3.4) 

'’O'  .  VH'' 

(3.2)  means  that  4^'*''  is  an  isometry  on  L'^.  That  is,  T'*' '  is  an  injection 
from  L^(51*'|  into  I  ■^(51*'  •  51*')  and  is  invertible  on  its  range.  The  inver¬ 
sion  formula  (3.4)  is  valid  as  written  when  f  has  sufficient  smoothness  and 
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decay.  Otherwise  tlie  integral  on  the  right  must  be  interpreted  as  an  ap¬ 
propriate  limit.  This  iiwersion  formula  is  I'ormally  analogous  to  the  I'ourier 
inversion  formula. 

We  now  give  a  brief  exposition  of  some  of  the  sampling  and  interpola¬ 
tion  properties  of  the  Gabor  transform  which  will  be  useful  later  on.  Details 
and  proofs  may  be  found  in  the  cited  references. 

The  reproducing  formula  (3.3)  characterises  the  range  of  the  map  '1' 

It  shows  that  this  range  is  very  small  and  hence  that  M'  *'  f  contains  a  lot  c'f 
redundant  information  about  f  This  fact  has  been  exploited  in  1  Id,  111  to 
show  that  in  fact  f  •  Sihf*' )  is  completely  recoverable  from  any  sufficiently 
dense  sampling  of  its  Gabor  transform.  This  has  also  been  show  n  in  other 
contexts  for  regular  lattices  (4,  17,  20).  The  necessary  density  ot  the  lattice 
depends  only  on  the  analyzing  function  li  and  not  on  f.  I  or  example,  it 
is  known  that  when  h  is  the  Gau.ssian,  then  any  f  •  S|25‘* ;  can  be  recov¬ 
ered  completely  from  the  samples  H'  lp:filn  2, ml  with  n.m  2.'*  in  the 
following  sense  |4, 5|. 

Proposition  3.3.  Given  f  •  SiT’P'l,  there  is  a  function  o  with  exponential 
decay  in  time  and  frequency  such  that 

fix!  ^  ^  ‘1'“'  J.nnp,.  :„Gx!  ■  >.'> 

tt  ‘  ••  'U  ‘  C.-’ 

where  the  sum  coin  erges  absolutely  and  uniformly 

Proposition  3.3  says  in  particular  that  the  collection  1  I ,,  _>c.  is  a 
frame  for  the  Prechet  space  Si2P'  I  [  14,  12|.  In  tact,  p  and  u  aiv  \  ery  close  in 

many  senses  (in  particular,  in  the  I  '  and  I  '  sense)  and 

fl\)^-  ^  ^  H'*'  (p;  f  i(n,  2,  in  i()„  _>.,„;xi  (3. hi 

'  C.* 

is  a  good  approximation 

In  the  case  '  1,  even  more  can  be  said  In  |hj,  it  has  been  recently 

shown  that  the  collection 

ho, II  pix  ul,  n '  2. 

hf.n  v2(p„  2.,n  •  I  1i'"’Pm  r.M,  1.  (  0,1,...;  11'  2 

is  a  non-orthogonal  basis  for  I  ^(IH)  and  an  unconditional  basis  for  SI'.H) 
(see  also  [12,  1.3,  20]).  This  basis  can  be  rewritten  in  the  following  more 
convenient  way. 

ho.M  p|x  n),  n  •  2. 

hf,„  v''2p(x  n/2|  cos(27tfx)  f  t- n  even;  n  2. 

hf,„  -  -  iv''^2p(x  -  n/2lsin(27i(x)  (  +  nodd;  n  2. 


{  Walnut 


196  } 

As  before, 

fix)  -  ^  ^  (f.h^n)  hf.„(x) 

(  e  net 

is  ii  good  approximation.  The  advantage  of  such  a  basis  is  that  a  real-valued 
function  f  is  developed  as  a  series  of  real-valued  functions.  Also,  its  close 
relation  to  the  Gabor  functions  allows  for  easy  computation. 


3.2.  Inversion  formulas 


Proposition  3.4.  Let  A  >  0  and  let  the  collection  of  functions  ‘  be 
defined  by 


91'. „ 

-  Ovlx  A 

'  -n 

1.  n  1;  7. 

th-  M 

■  \  2q,vlx 

A  ' 

-’n  2!cosl2nA'  ‘(x) 

(  n  even;  n  -  2. 

-  V  2gA|x 

A  ' 

-’ll  2lsin(2nA'  -fx! 

(  •  n  odd;  n  •  L 

Then  g'„ 

’  is  an  unconditional  basis  for  SitHI.  Moreover. 

1  •  V  21  ,  1 

1  .-..q. 

n  Z. 

9  ( ,  11 

■  1  \  21  ,  , 

'  •  tt  J 

dv  .',0  *  I  .V.  ^gi 

i  ‘  n  even;  ti  ■  2 

fl.' „ 

l.  i\  21  , 

*  •’  tt 

2  ( f  1  g  I  \i  .'(-g  \ 

(  •  n  odd;  n  •  L 

Proof.  The  first  part  is  merely  a  dilation  of  the  Wilson  basis  defined  previ¬ 
ously  The  second  part  is  an  easv  computation.  | 

Proposition  3.5  (Inversion  with  Wilson  bases).  l  et  f  ■  SlfL’l,  and 
suppose  that  for  each  0  e  S*'  ', 


Rafis  I  LL  c,.„|0)  o' „ls) 


Then 


1-L7) 


1)  If  ci  is  even. 


fix)  l,/|27ti)‘’^‘'  ^  ^  L„  ^  Cf.nlOlHa*'  'g'',.,|x  tlldt)  I.L8) 

I  e  Ilf  c, 

and 

Af|x)  =  l/|2m)‘'  ^  L.  _  Cf,.,(t3)0<'g?„(x  t))dt)  13.9) 

t  OnCC.-'^'J 
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2)  If  d  is  odd. 


f(x)  =  1/(2711)'*-'  y  d)dd  (3.10) 


Proof.  Note  first  that  since  the  sum  in  (3.7)  converges  absolutelv  and  uni¬ 
formly  and  in  L^, 


(Rdf|''^(r)  =  LL  c,-,„(0)q’'„(r). 


Bv  (2.4),  we  have 


•  r 

f(x)  =  (Rot)''(r)c‘^'-'''’"'^rl‘'-'  drdO 

J,-'  '.-  V 


Ct'.nid  „  (rlc 


(  0  1\  -  c.  ' 


ri'‘  dr  do 


^  r 

I.  (27Ti)‘'"' y  y  Ci.mIO)  air) -27110, \„(r) 

(  i' tiCC  ■  ' 


' '’  ’(27111)''  'drdO 


1/12711,''  _ 


C(.„(0)H(V^'  'a,'„(x  OldO. 


This  proves  (3.81,  and  (3.10)  follows  similarly. 
To  prove  (3.9),  note  that 


Af(x)  =  -f,:f(f.)c‘''"' '  di, 

.  'll 


'  r  ^ 

^  .S'*  '  Jo 

f 

.  s-'  '  .  -  - 


r(Raf)'^(r)c 


A  /  ,.  I  I  \  0  t  _<l  I 


r‘'  '  dr  dO 


(Rflf )  '(r)  c 
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r''drdt). 


The  result  now  follows  just  as  in  the  proof  of  (3.8). 
Lemma  3,6.  Let  i,  e  SR'*,  \  >  0.  Then 


R9(E,.g^)(.s)  .  A'^^c  ''‘•®'‘’ti.9(jA(s). 


(3.12) 
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Proof.  With  i,  li,  ■  0  lO  •  L',  wo  luu  o 


L,  QxI  sO  J  y  i  dy 


.  i)  • 


'"■“c  dy 


c''"  >  -c  ■’'"■'a''  '  i-'  "  0  '  dy 


-r'"  '  'c  ' 


V^bscrx  th,it  i,' '  i.  -  iL  di' conipk'lo>  llto  proi't  | 

Proposition  3.7. 

I)  i’or  d 


'  <  '  >.  1  •  ,i  ■ 

Jlirr'  i’‘  - 


‘  fid!’  M  ,  didd 


w  horo  II  doiioti's  till'  I  lilbort  tr.tnslorin,  soo  2.t'', 
2>  lord  odd, 


*  '  ‘  >1  ,,i  I 


I  ,  I I 


i-'"  ’  '  d'J  'I  ,  ,u)i!v  •  didd. 


Proof,  lioth  (.11  .ind  (bi  follow  Iroin  tlu’  inxor.sion  fornnil.is  in  tlu-  Loroll.ir\' 
to  Proposition  2.7  .ind  trom  I  I'liini.i  .A  ti.  | 


Corollary  to  3.7.  I  or  d  ovoii 


Al  ,  U\l\i 


2(2-Til'' 


a'  ’i-  ' 


'  ‘  ■’  'd‘i|  ,  „cuU  didd 


Proposition  3.8  (Filtered  backprojection).  l  ot  f  S|dP' i.  A  d 
1)  ltd  is  even. 


'I'  l(U;fllx,f-l  ,  ,y  ‘c 

2(2.11 1*'  ‘ 


I  sV  'itV 


,,:v\  1  O'  lJa't  ' 


\w:  'h  lUi.N  •  Riit'lx  d)dd 
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2)  If  d  is  odd, 


2(2711)^ 


^1  2g~7TA  'itr 


c"-'  *  R0f(x-0)d0 


Proof.  By  Proposition  3.7, 

By  Proposition  2.4, 


H''‘i’(g,K:f)(x,^,)  =  EtgA^f(x) 


=  a' 


,tt\  '  !£.  0.-  ll  -.1 


1  "‘'Et  0g.\  ♦  R0t(x  •  01  d0. 


From  this  the  result  follows.  I 

Corollary  to  3.8. 


1)  If  diseven, 


:g,v;Af)(x.i,) 


2(2ml‘’ 


A'  -e 


f"  a,  '(E  0)''E''  (d‘' 

i  0  ' 


;  R0f  l|x  0,  (L  ■  0)  d0 


w'here  a,  ^  (‘')(27Ti)'. 
2)  If  d  is  odd. 


4''‘''(gx;f)(x,£.|  -  ‘c 

2(2m)‘‘  ' 


(11  j. 

^  e 

'  S  ‘1  ' 


„nA  0,d,  .  01d0 


where  a,  -  ( ‘' '  ' 


i  -  (  i  ')(27Ti|'. 
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Proof.  Both  (a)  and  (b)  follow  by  the  same  argument  as  in  Proposition  3.8 
together  with  an  application  of  Leibnitz's  rule.  | 

4.  Radon  inversion  based  on  the  waveiet  transform 


4.1.  Background  and  preliminaries  on  the  wavelet  transform 


Definition  4.1.  Given  g  a  real-valued,  square-integrable  radial  function  on 
fH**  which  satisfies 

'•30 

lgo(s)|'^/s  ds  <  oo  (4.1) 

Jc 

(where  g(L)  =  godLI)),  we  define  the  wavelet  transform  of  f  by 


<I>'‘"(g:f)(u,v) 


f(t)c  '^g(e  “t  —  v)  dt  ~  f  *  D^.u g(e'‘\’) 


(4.2) 


where  u  6  and  v  e  (B'*.  Any  function  satisfying  (4.1 )  is  called  admissible. 
As  with  the  Gabor  transform,  the  following  are  easily  proved  [16]. 

Proposition  4.2. 


'2(J 

!dl 


'■3C 

IO*‘*'(g:f)(u,v)i'^  dudv  =  lgo(s)i^/s  ds 

.  0 


lf(t)l^dt. 


•.'O' 


0'‘"(g;f)(u.v|  = 


0'‘"(g;f)(u'.v') 


f(t)  = 


0'‘‘4g;g)(u  -  u',  V  -  c“  “v')  du'  dv'. 
0''"ig;  f  )(u,v)  e  ‘  g(c  “t  -  v)  du  dv. 


(4.3) 


(4.4) 

(4.5) 


When  d  =  1,  it  is  possible  to  construct  compactly  supported,  dif¬ 
ferentiable  functions  \i)  such  that  the  collection  [2'  is  an 

orthonormal  basis  for  L^(fR).  Computing  the  expansion  of  a  given  f  in  this 
basis  is  extremely  fast  numerically  [3]. 


4.2.  Inversion  formulas 

Proposition  4.3  (Inversion  with  wavelets).  Let  f  e  S(91‘* ),  and  suppose 
that  for  each  B  e 

R0f(s) 

j  €  i  Cl 


(4.6) 
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where  i])j,k(s)  =  -  k).  Then 

1)  If  d  is  even, 

f(x)  =  1/(27Ti)‘‘-^  V  V  f  Cj,k(e)Hd?-’4»j.k(x  e)d0  (4.7) 
and 

Af(x)  =  V  y  [  Ci,k(0)d>i,k(x  a)d0  (4.8) 

)€'C. 

2)  If  d  is  odd, 

f(x)  =  1/(2711)“-' ^  ^  Ci,k(0)d“-'ij.i.k(x-0)d0  (4.9) 

Proof.  This  follows  exactly  as  in  Proposition  3.5.  I 

Lemma  4.4.  R0(Dei.g)(s)  =  "  -^Dc- Regl-s). 

Proof. 

c-““  ^  g(c-'Ms0  +  y)) dy  =  c-'“'  ^  [  q(c-“s0  +  e  “y ))  dy 

.  0-^  Je^  ‘ 

-  0"“  ^e""  g(e-'‘s0  +  y ))  dy 

=  Regis). 

■ 

Proposition  4.5.  For  g  e  S(0l“), 

Dc.ug(x)  =  ?/2e-“'“-"  D..>.  l'““R0g(x  0id0. 

.  s*'  ' 

Proof.  Note  first  that  for  any  h  t  S(iH),  a  <E  03, 

rDc.uh(x)=  f  lY|-“(Dcuh)'^(Y)c‘^'"^’‘dY 
J'JH 

=  e'‘''^f  lYl““h(e'‘Y)c^'’‘^’‘dY 

=  lY|-“h(Y)e^''‘’''‘’  “-'dY 

J'.K 

-  c“^‘D^urh(x). 
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5.  Conclusions 

We  have  seen  how  the  Gabor  and  wavelet  transforms  relate  to  the  Radon 
transform.  We  have  derived  inversion  formulas  for  the  Radon  transform 
based  on  Gabor  and  wavelet  expansions  either  by  a  direct  method,  or  by 
filtered  backprojection.  The  second  approach  giv'es  directly  the  Gabor  or 
wavelet  transform  of  f  from  knowledge  of  the  one-dimensional  transform  of 
Rfl  f  for  each  0. 

The  idea  of  using  localizing  transforms  to  invert  the  Radon  transform 
may  prove  practical  for  the  following  reasons. 

1)  Discretization  properties  of  the  continuous  parameter  Gabor  and 
wavelet  transforms  are  well  understood.  Recovery  of  a  signal  from 
sparse  and  even  irregular  samples  of  its  Gabor  or  wavelet  transform 
ha\’e  been  studied  in  [11,  10,  20].  Moreover,  an  interpolation  theory 
exists  for  the  Gabor  or  wavelet  transform  which  allows  recovery  of 
the  continuous  parameter  transform  from  its  samples  at  a  regular 
or  irregular  lattice.  This  kind  of  built-in  interpolation  may  enhance 
numerical  stability. 

2)  Fast  numerical  algorithms  exist  for  computing  the  Gabor  and  wavelet 
expansions  of  signals. 

3)  The  spatial  localization  properties  of  the  Gabor  and  wavelet  transforms 
suggest  efficiency  in  the  odd  dimension  case  and  also  for  local  tomog¬ 
raphy  in  the  plane.  The  formulas  for  inversion  of  the  Radon  transform 
in  both  the  Gabor  and  wavelet  case  are  local  in  odd  dimensions,  and  in 
even  dimensions  Af  can  be  recovered  in  a  local  fashion. 

4)  The  inversion  formulas  allowing  the  recovery  of  the  Gabor  and  wavelet 
transforms  directly  from  the  Radon  transform  data  allow  the  selective 
recovery  of  f  or  Af  at  certain  frequencies  (the  Gabor  case)  or  at  certain 
resolutions  (the  wavelet  case).  This  feature  can  be  useful  in  the  noise 
reduction  of  tomographic  images  [21]. 
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a  Those  dynamical  systems  generated  by  integer  matrices  operating  on 
multidimensional  tori  are  useful  general  exemplars.  In  particular.  Bill 
Moran  and  !  have  recently  explored  notions  ot  dependence  between  pairs 
of  such  systems. 

It  is  well  known  that  if  the  m  v  m  integer  matrix  A  is  nonsingular  and 
has  no  roots  of  unity  as  eigenvalues,  then  (A")  is  uniformly  distributed  for 
almost  all  vectors  x  on  the  m-tonis  (x  is  A-normal). 

We  have  proved  that  given  two  such  matrices  A  and  B  which  ci>m- 
mute,  A-normalitv  coincides  with  B-normality  if  and  only  if  A"^  =  B'  for 
some  positive  integers  rand  s.  This  confirms  a  longstanding  number  theory 
conjecture  of  Wolfgang  Schmidt. 


I.  Introduction 

Let  me  say  at  the  outset  that  the  main  new  result  which  eventually  I  will 
sketch  is  joint  work  with  William  Moran  [2|.  That,  in  turn,  traces  back 
through  ideas  of  Schmidt  [8,9, 10],  and  Cassels  [31,  to  a  problem  of  Steinhaus 
which  can  be  presented  as  pure  number  theory.  It  will,  1  hope,  add  interest 
to  emphasize  the  connexion  with  ergodic  theory  and  dynamical  systems. 

The  dynamical  systems  in  question  have  discrete  time  and  are  deter¬ 
mined  by  the  action  of  an  n  x  n  integer  matrix  T  on  the  n,-dimensional  torus 
X"  =  From  an  initial  vector  x  in  'I"  the  system  evolves  along  the 

orbit  (T‘'x)  as  time  k  =  1,2, . . .  varies. 

The  simplest  example  occurs  when  n  =  1  and  the  operator  T  is  multi¬ 
plication  by  2.  We  may  think  of  the  initial  vector  x  being  in  ]0,1]  and  having 
binary  expansion  x  =  ,  Xk2'  or  x  =  X|X2X{  ■  •  The  evolution  of  the 

system  amounts  to  shifting  along  the  tail  of  the  expansion  of  x.  As  we  all 
know,  this  simple  system  illustrates  some  of  the  basic  notions  associated 
with  chaos  theory.  In  particular  we  may  obviously  choose  x  to  exhibit  cycles 
of  arbitrary  length,  yet  for  almost  all  x  the  orbit  is  "randomly"  distributed 
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over  I.  This  butterfly  effect  relates  even  to  time  averages,  since,  for  almost 
all  X  and  every  continuous  real-valued  function  f  on  ‘I,  we  know  that 

K 

f(2'"x)  =  fdm,  (1.1) 

K-^oo  K 

k  1 

where  m  is  Haar  measure  on  T.  In  other  words,  x  is  normal  in  base  2. 

That  famous  result  of  Borel  [1]  (in  view  of  its  place  of  publication, 
not  inappropriate  for  a  workshop  on  Italian  soil!),  has  been  extended  to  the 
multidimensional  case  by  Rokhlin  [7],  and  Cigler  [4].  In  fact  we  say  that 
(T‘'x)  is  uniformlv  distributed  if,  for  all  real-valued  continuous  f  on  T"  , 

1 

lim  -  y  f(T‘'x)  =  fdm.  (1.2) 

k  I 

where  m  is  now  Haar  measure  on  X".  We  then  say  that  x  is  T-  ;7orn7a/. 
Moreover  we  call  the  matrix  T  ergodic  if  m  almost  all  x  are  T-nor777al.  It 
turns  out  that  T  is  ergodic  if  and  only  if  T  is  invertible  and  has  no  root  of 
unity  as  an  eigenvalue. 

The  problem  of  Steinhaus,  solved  by  Cassels,  is:  "Do  there  exist  num¬ 
bers  normal  to  base  2  but  not  normal  to  base  3?"  Schinidl  19]  proved  the 
definitive  one-dimensional  result.  For  integer  bases  s.  t,  all  numbers  normal 
to  base  s  are  normal  to  base  t  if  and  only  if  s“  =  t*’  for  integers  a,  b.  ((Oth¬ 
erwise  there  are  uncountably  many  numbers  not  normal  to  either  one  of  the 
bases  and  normal  to  the  other ) 

Cigler  [4i  proved  that  if  the  ergodic  matrices  S,  1  are  rationally  depen¬ 
dent  in  the  sense  that  S"  -  1 then  S-normality  and  T -normality  07incide. 
We  are  thus  tempted  to  say  that  the  dynamical  systems  generated  by  erg(7dic 
matrices  S.  T  are  dependent  if  S-normality  and  1 -normality  coincide.  Is  it 
too  much  to  hope  that  dependence  implies  rational  dependence?  In  fact 
Schmidt  conjectured  this  in  [lOj  and  proved  by  a  tour  de  force  that  the  result 
holds  under  the  additional  hypotheses  that  (i)  SI  ^  IS  and  that  (ii)  every 
eigenvalue  of  S  has  modulus  strictly  greater  than  one.  It  is  hypothesis  (ii) 
which  Moran  and  I  have  removed  so  we  have  the  result  for  all  commuting 
systems.  (It  is  interesting  to  note  that  dependence  implies  rational  depen¬ 
dence  when  S,  T  are  assumed  to  beanfumorphisms  of  X^.  This  nice  result  of 
Sigmund  [  11  j  is  much  simpler — and  essentially  disjoint  because  commuting 
automorphisms  are  automatically  rationally  dependent). 

2.  Schmidt’s  result  in  one  dimension 

Let  us  fix  integer  bases  s,  t.  Maxfield's  result  that  s-normality  and  t-  normal¬ 
ity  coincide  when  s“  -  t^  is  not  difficult.  Accordingly  let  us  assume  that  s,  t 
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are  rationally  independent.  We  seek  to  establish  the  existence  of  numbers 
which  are  s-normal  but  not  t-normal. 

We  know  how  to  tinker  with  a  base  t  expansion  to  prevent  normality 
and,  intuitively,  we  feel  this  should  not  affect  the  base  s  expansion  of  such 
numbers.  More  precisely  we  may  choose  a  probability  measure  u  with 
respect  to  which  almost  all  numbers  fail  to  be  t-normal  (e.g.,  if  t  =  3  we  may 
take  a  uniform  mass  over  the  Cantor  middle  third  set)  and  we  may  hope 
that  VI  almost  all  numbers  are  s-normal.  In  view  of  Weyl's  criterion  we  may 
even  expect  to  achieve  that  last  step  by  estimating  Fourier  transforms.  This 
recipe  is  very  much  the  one  used  by  both  Cassels  and  Schmidt.  Because  thev 
manipulated  the  base  t  expansion  their  measure  vi  is  genericallv  of  infinite 
convolution  type. 

In  a  sequence  of  papers  [2, 5, 6],  Charles  Pearce,  Moran  and  I  explored 
the  possibility  of  choosing  instead  a  Riesz  product  for  vi.  It  turns  out  that 
there  are  significant  technical  gains  although  our  first  efforts  were  more  than 
somewhat  clumsy.  Here,  with  the  benefit  of  hindsight,  is  a  cleaner  version. 
Choose 

K 

H  -  lim  rr ( 1  -t  cos 2rtt‘'x)  •  tn, 

K  -  X- 

i.  1 

where  nv  is  Haar  measure  on  X.  Then  ii  is  a  probability  measure 
whose  Fourier  transform  vanishes  off  words  t)f  the  form  with 

cv.  0,  ;  I  .Also 

Note,  in  particular,  that 

1  I 

-  ^exp(27tit*'x)  -  (ua.e.) 

^  i.  1 

Comparison  with  (1.1 )  or  (1.2)  with  f(\l  exp(27iixl  shows  that  vt  almost 
all  numbers  are  not  t-normal. 

Next  we  claim  (following  the  lines  ol  Davenport,  Firdds  and  Leveque): 
Proposition  2.1.  Suppose  that,  for  all  r, 

>  N  k-1 

X.  Z!  -’■s’)|<‘»  (2.1) 

Ti  1  V.  1  i  I 

then  VI  almost  all  numbers  are  .s-normal. 
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Proof.  In  view  of  Weyl's  criterion  (i.e.,  (1.2)  with  n  =  1  and  f(x)  = 
exp(27iirx)l  it  suffices  to  prove  that,  for  all  nonzero  integers  r, 

1 

—  ^  exp(27iirs*^x)  — >  0  (ua.e.)  (2.2) 

k  t 

Because  |exp(27riy)l  =  1 ,  it  is  in  fact  enough  to  show  (2.2)  along  some  mildly 
lacunary  increasing  sequence  N  =  Nj.  In  fact  provided  that  N, ,  i  $  (l4c)Nj, 
we  find 

I  ^  N  I  M 

jq  ^  exp(2mT‘;*^x)  -  —  ^  exp(27rirs''x)  <  2e, 
i  fc  1  k  I 

whenever  Nj  $  N,M.  Nj,i. 

Observe  also  that  (2.1 )  gives 


^  N  '  N  '  ^  exp(2Ttirs^x)  du(x)  <  oo. 


Choose  Mi .  1  =  [( 1  +  e  IMil,  then 


^  /  M  j ,  1  -  M  i  \  f  ^  \f  - 1  ^  I  ' 

\  i  -  rnm  N  )  exp(2mrs*- 

^  V  M  j  /  M  ,  ■  N  M , . ,  J  !  ^ 


x|,  dn(x)  oo 


y~  Nj  '  ^  exp(27Tirs''x)!  dn(x)  <  OC>, 


i  I  !  V-  1 


for  some  suitable  increasing  (Nj )  and  the  result  follows.  | 

It  remains  to  establish  (2.1 ).  This  we  achieve  crudely  by  counting  the 
number  of  possible  nonzero  Fourier  coefficients,  i.e.,  by  counting  solutions 
of  the  equations 

in 

r(.s‘‘  -  sM  -  J:l|.  (2.3) 

I  1 

In  fact  let  Gn  be  the  number  of  k  ^  N  such  that  for  some  i  with  0  S;  i  ^ 
k  -  logN  and  some  Cj,  equation  (2.3)  holds.  In  view  of  Proposition  2.1  it 
suffices  to  check  that 

22N“^Gn<oo.  (2.4) 


I 
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This  is  because 

k  ) 

^  \  ^  ^  (rs''^  rs'  l  ■  yN-'  ’l\GN  -  \iog\L 

kill 

To  check  (2.1!  we  use  the  independence  <>l  s,t  and  Alan  Baker's  lanioU' 
estimate  that  there  is  a  constant  C  so  that 


n  log  s  -  m  log  t  i  >  C  ^ ,  '  2  a ' 

whenever  0  •  max!  n  ,  m  i  -  We  can  replaci-  s,  t  b\'  s'',  t'  without  liw" 
ot  generalitv  and  thereft're  can  assume  that  mini  s,  1 1  e  *. 

Lemma  2.2.  It  !2..Ti  holds  with  c..,  1  then  s“^  t"  hidongs  to  .i  union  ot  .o 

most  .V"!'  discs  of  radius  0l  C  '  ^  >.  K  ing  m  some  fixed  disc. . 


Proof.  t"'  :■  'll  s'  '■  1  'll  •  H  '  I '  ^'c  t'  "'i.  The  second  term  ot  till’ 

product  is  I  ■  OW  ^  ,  TIu  third  term  can  be  writti'ti  (with  w-  in  ['Luc 
of  c , ' as 


I  ■ 

\ 


I 


I,  SI  N 


L 


where 


R  ■ 


il  f  t  2. 

L  cl  1 


.  .  I 

V 


\ 


In  view  of  the  lemma  and  inei.iualitv  2.">c,  nono  of  the  discs  win  contain 
more  than  one  element.  It  follows  that  Gn  does  not  exceeii  N'"''  '  and  2  1 
is  established.  I 


3.  N  dimensions — underiying  method 

hollowing  Schmidt  we  cimsider  a  somewhat  wider  class  ot  matrices,  the 
ahno-^t  nt/('g(T  matrices.  These  are  tv  •  n  invertible  matrices  with  ratimial 
entries  and  all  of  whose  eigenvalues  are  algebraic  integers.  Such  a  matrix  is 
ergodic  if  and  only  no  eigenvalue  is  a  ro(*t  of  unitv  and  it  is  these  so-called 
altnost  crgoiiic  matricesthat  we  use.  Schmidt  proved  m  111)]  that  for  e\erv 
almost  ergodic  matrix  I  there  exists  an  integer  d  (the  dcmmiimtior  of  I )  such 
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that  dT”  is  integral  for  all  n  =  1,2,...  This  makes  it  possible  to  consider 
Riesz  products  of  the  form 

K 

U=  lim  n  ( 1  +  cos  27TaT  ^x)  •  m, 

k  1 

where  m  is  Haar  measure  on  'X"  and  a  is  an  integer  vector  multiplied 
by  the  denominator  of  T.  Of  course  this  makes  sense  only  if  there  is  a 
substitute  for  the  lacunarity  familiar  in  the  one-dimensional  case.  We  require 
Jissodateness  in  the  sense  that  equations 

K 

^C.O(1'=0  (e,  C  ;o, ±1.1-2;) 

I  1 

cannot  arise  unless  all  ci  =  0.  Because  T-normality  and  7 ''-normality  co¬ 
incide  we  are  at  liberty  to  replace  T  by  some  suitable  power  to  achieve 
dissociatencss.  The  appropriate  result  is  the  next  lemma. 

Lemma  3.1.  Let  T  be  almost  ergodic.  Then  there  is  a  positive  integer  p  such 
that  (af"'’)  is  dissociate  for  all  a  ^  Oin  13". 

Proof.  If  every  eigenvalue  of  I  had  modulus  one  then  Dirichlet's  theorem 
would  force  some  root  of  unity  to  bcanetgeni  aluein  conirodiction  o/ergod- 
icity  Accordingly  we  may  replace  T  by  f'  to  ensure  that  some  eigenvalue 
.\i  of  I  has  modulus  greater  than,  say,  3.  (The  integral  nature  of  dT  "  rules 
out  the  possibility  that  all  eigenvalues  are  inside  the  unit  circle). 

Assume  then  that  ;A|I  >  .land  Ai  is  an  eigenvalue  of  T.  First  wedecom- 
pose  I  over  the  rationals  into  a  direct  sum  of  matrices  whose  characteristic 
polynomials  are  of  the  form  q(xl'  where  q  is  a  monic  irreducible  over  i3. 
At  least  one  component  of  a  in  this  decomposition  will  be  nonzero  and  the 
entries  of  c*  remain  rational.  Replacing  T  by  a  suitable  component  matrix 
we  can  a.ssume  that  the  characteristic  polynomial  of  T  is  a  power  of  an  irre¬ 
ducible.  Next  we  consider  I  as  a  linear  operator  on  C"  and  decompose  ff"  as 
a  direct  sum  of  subspaces,  VL\  ker  (A  7  )'  as  A  ranges  over  the  eigenvalues 
of  7.  The  decomposition  can  be  achieved  over  the  splitting  field  of  q  and 
the  automorphism  group  of  this  field  will  permute  the  components  of  tx, 
leaving  «  unchanged  since  it  is  rational.  Thus  the  component  of  at  in  each 
of  these  subspaces  is  nonzero.  We  may  replace  T  and  a  by  their  components 
on  the  subspace  V'\,  and  we  choose  .s  to  be  the  largest  integer  such  that 
|.7  -  o((A|  T )'  y  0.  Now  any  equation 

^c,al '  -  0 
leads  to  the  equation 

^e,a(A,  1  rT‘  --=0 
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which  is 

^c,A13  --=0 
and  that  forces  Ci  0.  I 

Matters  are  now  arranged  so  that  analogues  of  the  one-dimensional 
case  may  be  employed.  In  particular  it  is  straightforward  to  see  that  o 
almost  all  vectors  in  I"  are  not  T-normal.  The  challenge  is  to  prove  that  u 
almost  all  vectors  are  S-normal  and  this  boils  down  to  counting  solutions  of 
matrix  Diophantine  equations  of  the  form 

A.1  '  fri  J  f 

|3S"  -  |3S’  ^  Y.  ‘-'n.aT'"  Ic,.,  *;  iO.ill)  (3.11 

n,  I 

4.  Irreducible  case 

F-or  a  workshop  such  as  this  it  is  inappropriate  to  plough  through  the  tech¬ 
nical  details  of  the  proof  so  let  me  discuss  the  simplest  case,  that  in  which 
the  algebra  .4(S,  I  I  generated  by  S.  I  is  irreducible.  That  forms  the  basis  for 
an  induction  proof  of  the  general  case  and  the  reader  is  referred  to  12)  for  the 
full  details. 

We  assume  then  that  O"  has  no  invariant  subspace  for  the  algebra 
>t|S,  T  land  that  for  each  a  in  0"  there  exists  |3  in  O"  with  X!.  ( 1^ '  = 

where  Gn  ( |.l  I  is  the  number  k  •-  N  such  that  for  some  i  with  0  s  i  t; 
k  ■  logNI  and  some  ci.C’.C! . t.v,  i  .r  It,,,  ‘  '0,  ±1!| 

S',  t.i 

(.IS*-  I.3S'  Y  t-ol'". 

1  1 

L.et  us  suppose  further  that  for  any  eigenvalues  A  of  S  and  p  of  F  with 
A"  A  p"'  and  max(irn':,  n")  N'  then 

In'  log  A  -  m'  log  p'  G' 

and  (as  a  consequence  of  replacing  S,  I  by  suitable  powers)  that  any  eigen¬ 
value  of  S  or  I  with  modulus  ,>  I  has,  in  fact,  modulus 

Lemma  4.1.  Under  the  prevailing  conditions  there  exists  an  n  ■  n  matrix  U 
over  O  such  that  if  I  n  is  the  number  of  all  k  for  which 

MlL.ii 

U(S‘'  S’)-  Y 

m  I 

for.someO  j  •-  k  log  N  and  c,„  t  [0,  1 1 1  then  ^  N  ■^F  n 
U  belongs  to  yl(S,  T ). 


oo.  Moreover 
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Proof.  We  know  that  for  each  a  /  0  in  O"  there  exists  some  (3  in  £J”  such 
that 


M(k,ii 

|3(S^-SM=  X  (4.2) 

m  1 

for  infinitely  many  pairs  (k,  i)  corresponding  to  k  in  Gn  ((3 ).  Choose  a  cyclic 
vector  ao  for  yi(S,T)  and  let  |3o  be  the  corresponding  vector  as  in  (4.2), 
Multiply  both  sides  of  (4.2)  by  an  arbitrary  element  of  >1(S,T  )  to  see  that 
(4.2)  does  indeed  establish  a  linear  correspondence  which  we  can  express 
in  the  form  [3  -  aU.  Evidently  U  belongs  to  the  algebra  generated  by  S 
and  T.  | 

Lemma  4.2.  Let  >1  j  0  be  an  irreducible  commutative  subalgebra  of  Ml£}'  ). 
There  is  a  finite  field  extension  F  of  Q  and  a  field  isomorphism  ij'  :  .4  •  F 

such  that,  for  each  A  in  .4,  \J>(A)  is  an  eigenvalue  of  A.  Moreover,  given 
some  fixed  S  in  A  and  as  eigenvalue  As  of  S,  we  may  choose  \f'(f>)  =  A.. 

Proof.  If  A  ■:  .4  and  ker  A  0  then  ker  A  would  be  a  proper  invariant 
subspace  of  .4.  Accordingly  each  A  in  .4  is  invertible  and  .4  is  a  field  which 
IS  isomorphic  to  a  finite  extension  F  of  0"  under  some  map  tji  :  yt  •  F. 
We  know  from  the  Cayley-Hamilton  theorem  that  A  satisfies  its  own  char¬ 
acteristic  equation  and  therefore  4’(A)  is  an  eigenvalue  of  A.  The  minimal 
polyncimial  tUs  of  any  nonzero  S  in  ./I  is  irreducible  and  so  the  Galois  group 
acts  transitively  on  the  roots  of  that  polynomial.  It  follows  that,  given  A.,  we 
may  indeed  choose  As.  I 

Proposition  4.3.  Under  the  conditions  of  this  section  there  are  integers  p,  ci 
such  that  S''  T‘'. 

Proof.  We  apply  the  last  lemma  to  choose  4' :  ^1(5,  1  )  -  F  such  that  i|>(S  I  - 
C‘.  Now  evaluate  both  sides  of  (4.1)  under  4'  to  obtain  an  analogue  of 
(2. .3)  with  4’(U)  in  place  of  r,  4'(Sl  in  place  of  s,  441 1  in  place  of  t.  The 
one-dimensional  methods  of  the  previous  section  force 

445)"  ■  441)'’,  forsomep.q, 

and  we  deduce  that 


S’’  I 
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5.  Forwards  and  sideways 

The  Riesz  product  technology  makes  it  possible  almost  to  separate  the  linear 
algebra  from  the  Fourier  analysis  and  number  theory.  That  is  why  Moran 
and  1  have  been  able  to  push  Schmidt’s  methods  much  further.  We  have  also 
made  further  progress  in  the  non-commutative  case  but  that  work  is  still  in 
preparation  and  is  still  far  from  resolving  the  "big"  conjecture. 

Absolutely  fundamental  throughout  the  work  is  the  possibility  of  rais¬ 
ing  S,T  to  suitable  powers  without  affecting  normality.  This  is  very  much 
a  feature  of  the  (almost)  integral  nature  of  the  matrices.  Even  in  the  one¬ 
dimensional  case  there  are  difficult  questions  concerning  non-integer  bases. 
For  example,  is  normal  to  base  2? 

In  a  forthcoming  series  of  papers  Berend,  Moran,  Pollington  and  mvself 
will  demonstrate  several  new  results  on  normality  to  non-integer  bases.  For 
example  we  show  how  to  construct  generic  examples  of  d  such  tha  t  normality 
to  base  B  does  not  imply  normality  to  base  d’’  and  normality  to  base  d’’  does 
not  imply  normality  to  base  d.  Moran,  Pollington  and  1  can  also  show,  for 
example,  that  every  number  normal  to  base  vTO  is  necessarily  normal  to 
base  10  but  we  believe  that  the  converse  fails.  Riesz  products  play  a  key  role 
there  also. 
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I 

a  Tliis  paper  will  contrast  two  wavelet-ba.sed  image  analysis  techniques, 
one  fundamentally  iu)n-linear  and  the  other  essentially  linear,  to  prtx;ess 
images  arising  from  different  physical  sources  and  requiring  fundamentally 
different  processing  outcomes.  The  intent  is  to  emphasize  the  flexibility 
inherent  in  image  prtxressing  algorithms  eyen  given  the  constraint  that  the 
initial  feature  extraction  process  is  a  wavelet  analysis.  (It  should  be  noted 
that  current  physiological  data  from  mammalian  visual  centers  indicate 
that  a  Gabor-like  wavelet  analysis  is  one  of  the  first  steps  m  animal  visual 
processing;  in  humans  this  becomes  an  exquisitely  flexible,  adaptable  and 
programmable  prtx:ess,  to  match  the  specific  visual  task  required.) 

We  will  first  describe  a  non-linear  system  to  IcKate  specific  key  land¬ 
marks  on  VLSI  chip  photomicrographs.  These  landmarks,  easily  visible 
to  a  human  observer,  are  not  extractable  by  any  linear  spatial  filtering 
technique  or  any  thresholding  technique.  However,  if  a  series  of  Gabor 
correlation  planes  (using  appropriately  selected  size,  frequency  and  orien¬ 
tation  parameters)  are  computed,  it  is  fxissible  merely  by  selecting  the  max 
or  min  value  for  each  pixel  in  the  registered  set  of  planes  to  produce  an 
"image"  which  clearly  shows  the  desired  landmarks.  This  is  a  non-linear 
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rule-based  system  selection  "filter"  following  the  essentially  linear  process 
of  Gabor  correlation. 

In  a  second  case,  we  will  show  that  actual  infrared  images  can  be 
filtered  to  select  items  of  specific  sizes  and  texture  by  a  linear  summation  of 
Gabor  correlation  planes  followed  by  a  simple  threshold  rule. 

The  purpose  of  this  work  is  to  demonstrate  that  Gabor  features  seem 
to  be  intrinsically  useful  for  image  processing  provided  that  flexibility  in  the 
use  of  its  fruits  is  adopted  by  the  system  designer. 


1.  Introduction 

This  paper  will  describe  two  image  segmeirters  specifically  adapted  to  dis¬ 
tinctly  different  types  of  images.  Both,  however,  are  based  on  an  initial 
Gabor  wavelet  decomposition  of  their  images  and  differ  from  each  other 
only  in  their  post-Gabor-processing  details  [4].  The  fact  that  the  images  are 
derived  from  totally  different  sources  and  yet  are  both  usefully  processed  by 
a  wavelet  analyzer  suggests  to  us  that  this  may  be  a  broadly  useful  technique. 
We  also  note  that  the  initial  processing  of  images  in  the  vertebrate  brain  stem 
and  in  the  mammalian  visual  cortex  also  includes  a  close  approximation  to  a 
Gabor  wavelet  decomposition  of  the  scene  being  viewed  by  the  animal,  and 
are  thus  further  encouraged  to  explore  the  consequences  of  Gabor  decompo¬ 
sition  of  images.  The  first  segmenter  described  is  for  photomicrographs  of 
VLSI  chips  obtained  for  the  purpose  of  reverse-engineering  and  circuit  ver¬ 
ification  of  the  chips.  The  second  segmenter  is  designed  to  locate  potential 
targets  in  a  FLIR  image.  Finally,  we  will  conclude  with  an  appendix  outlining 
some  Gabor-like  processes  now  known  to  occur  in  animal  visual  systems. 


2.  VLSI  Chip  image  processing 

Figure  B.l  shows  a  typical  photomicrograph  of  a  portion  of  a  VLSI  cir¬ 
cuit.  This  is  a  512  X  480  array  of  pixels  derived  from  a  TV  camera  image. 
VLSI  circuits  are  built  up  from  a  small  repertoire  of  standardized  circuit 
elements  such  as  resistors,  transistors,  flip-flops,  switches,  etc,  which  are 
connected  by  straight  metal  conductors.  Layers  of  the  circuits  are  intercon¬ 
nected  on  the  chip  by  vias  or  contacts  which  are  round  or  toroidally  shaped 
elements.  Deriving  the  electrical  circuit  from  such  a  photograph  is  a  good 
candidate  for  automation  because  the  chips  consist  of  a  very  large  number 
of  iterated,  stereotyped  arrangements  of  a  small  set  of  possible  elements. 
Frequently,  thousands  or  even  millions  of  each  element  can  be  found  in 
currently  used  chips. 

The  first  elements  we  chose  to  find  are  the  round  vias  or  contacts.  Note 
that  while  these  are  visible  in  Figure  B.l  to  a  human  observer,  it  is  virtually 
impossible  to  extract  them  with  the  usual  image  processing  techniques.  They 
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can  be  extracted,  however,  with  a  Gabor-wavelet-based  process.  The  process 
begins  by  performing  a  two  dimensional  correlation  between  the  image  and 
an  appropriate  two  dimensional  Gabor  wavelet.  We  use  the  term  "Gabor 
filtering"  or  "Gabor  transformation"  for  this  operation  [3, 2].  This  process  is 
described  in  the  appendix  of  this  paper. 

Gabor  transforms  of  the  images  are  computed  directly  in  the  spatial 
domain.  Several  factors  make  this  method  attractive.  The  image  size  is 
512  X  480  pixels  so  that  high  speed  two-dimensional  array  processors  are 
available,  and  spatial  correlation  allows  control  of  the  decimation  of  the 
scene.  With  this  technique,  the  Gabor  transform  is  defined  as  the  dot  product 
of  the  Gabor  wavelet  and  the  image  at  each  point  on  the  image. 

The  quality  of  images  obtained  from  the  chips  varies  among  the  dif¬ 
ferent  chips  and  regions  of  the  same  chip.  Therefore,  the  images  are  first 
preprocessed  so  the  Gabor  filtering  has  the  best  chance  to  discriminate  the 
contacts.  We  generally  use  a  special  normalization  technique  to  do  this  pre¬ 
processing.  The  average  brightness  of  a  neighborhood  of  pixels  around  a 
selected  center  pixel  is  computed  and  the  brightness  of  the  center  pixel  is 
subtracted  from  this  average.  The  result  is  then  multiplied  by  2  and  added  to 
127,  the  middle  of  the  total  0-255  brightness  range  in  the  system  [7].  The  ef¬ 
fect  is  edge  enhancement  with  a  normalized,  constant  average  background. 
Contacts,  which  typically  appear  as  small  bright  regions  surrounded  by  dark 
rings,  are  emphasized.  This  local  computation  technique  is  similar  to  some 
of  the  normalization  processes  performed  by  the  vertebrate  visual  system 
[11].  Figure  B.2  shows  the  results  of  this  process  applied  to  Figure  B.  1 . 

Pixel  intensity  values  can  range  from  0-255  in  our  eight  bit  system; 
full  black  to  full  white.  In  some  images,  however,  a  histogram  of  pixel 
values  shows  the  total  intensity  range  to  be  very  narrow.  Linear  contrast 
enhancement  can  be  performed  to  spread  the  variations  of  image  intensity 
over  the  full  range  available  to  the  image  system  [5].  This  allows  easier 
visual  examination  of  the  image. 

After  this  preprocessing,  several  Gabor  transforms  are  taken  of  the 
image.  The  required  Gabor  wavelet  parameters  which  must  be  specified  are 
orientation,  Gaussian  envelope  amplitude  and  width,  sinusoidal  modulation 
frequency  and  wave  type  (sine  or  cosine).  Since  this  application  uses  spatial 
domain  correlation,  a  decimation  factor  can  also  be  specified. 

Gabor  filters  respond  strongly  to  linear  features  oriented  parallel  to 
the  filter's  principal  axis.  The  strength  of  their  response  is  also  dependent 
on  the  relative  size  of  the  object  and  its  components  or  texture  compared 
to  the  modulation  pitch  of  the  Gabor  wavelet.  Figure  B.3  is  an  example 
of  correlating  a  Gabor  pattern  with  a  45°  orientation  with  the  image  in 
Figure  B.2.  The  majority  of  features  on  the  VLSI  circuits  are  the  metal 
connective  "wires"  and  are  horizontal  and  vertical;  the  vias  and  the  contacts 
appear  as  circular  features.  Therefore,  we  used  a  set  of  Gabor  filters  with 
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rotations  of  20,  45,  70, 110, 135,  and  160  degrees.  Orientations  of  0  and  90 
degrees  and  their  multiples  were  avoided  in  order  to  suppress  the  wires 
and  help  enhance  the  circular  elements.  Not  all  rotations  were  required  for 
all  images. 

The  angles  chosen  cause  the  Gabor  filters  to  respond  strongly  to  the 
edges  of  the  contacts  and  the  corners  of  other  features,  but  not  to  many  of 
the  other  chip  features.  The  Gaussian  width  and  the  sinusoidal  modulation 
of  the  wavelets  are  chosen  to  match  the  expected  contact  size  in  the  scene. 
This  reduces  the  response  of  the  filters  to  many  corners  and  other  distractors. 
When  the  set  of  Gabor  filters  of  varied  rotation  is  applied  to  the  scene  and 
combined  correctly,  the  response  of  the  contacts  dominates  the  resulbng 
feature  set. 

The  transformed  scenes  are  combined  by  a  localized  non-linear  thresh¬ 
olding  operation.  In  the  Gabor-transformed  scenes,  the  pixel  values  are 
limited  to  {—127  to  127).  Each  input  image  will  produce  a  set  of  intermediate 
transformed  images  which  depend  upon  the  rotation  of  the  Gabor  filters. 
For  each  pixel  in  the  combined  image,  the  pixels  with  the  same  (x,  y )  coor¬ 
dinates  in  each  of  the  transformed  images  are  individually  examined.  The 
pixel  with  the  greatest  absolute  value  is  used  for  the  feature  set  (and  the  sign 
is  preserved).  This  technique  selects  the  extreme  values  and  assumes  they 
contain  the  most  information.  Once  the  feature  set  is  assembled  the  pixel 
values  are  shifted  to  range  from  0  to  255.  Figure  B.4  shows  the  result  of  this 
process.  The  pixel  array  shown  in  Figure  B.4  is  then  thresholded  to  select  the 
maximum  5-10%  pixels.  Figure  B.5  shows  the  superposition  of  Figure  B.l, 
the  original  image,  with  these  maximum  pixels  derived  from  thresholding 
Figure  B.4.  Each  of  the  bright  spots  is  a  potential  location  for  a  via  or  contact. 

In  typical  scenes,  the  highlighted  pixels  represent  only  a  few  percent  of  the 
original  512  x  480  array  (see  Table  2.1). 


Chip 

Scene 

Contacts 

Present 

Contacts 

Detected 

Area 

Covered 

A 

1 

27 

27 

8.0% 

A 

2 

45 

45 

72% 

B 

1 

42 

42 

6.4% 

B 

2 

54 

54 

8.3% 

C 

1 

23 

23 

3.(y% 

C 

2 

11 

11 

0.5% 

C 

3 

24 

24 

14.6% 

Table  2.1:  Segmentation  results. 

To  determine  which  of  the  highlighted  pixels  actually  represent  vias  or 
contacts,  a  video  subimage  from  the  original  (or  enhanced  original)  scene, 
about  the  same  size  as  the  vias  or  contacts,  is  extracted  and  correlated  with 
a  nominal  template  of  the  desired  element.  In  this  research,  correlation 
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was  usually  performed  by  a  trained  neural  net  (connected  in  the  back- 
propagation  mode)  [8]. 

By  limiting  the  number  of  such  locations  to  be  searched  to  a  small 
number  of  the  512  x  480  possible  locations  in  the  original  image,  the  pro¬ 
cessing  time  is  greatly  reduced.  Many  fewer  false  positives  result  and  in  our 
experience,  virtually  all  (96%)  of  the  targeted  forms  are  located. 

In  a  sense  the  nonlinear  Gabor  analysis  serves  as  an  initial  selector  filter 
for  points  in  an  image  which  have  a  high  likelihood  of  containing  the  sought 
image.  Thus,  the  expensive  correlation  scheme  to  determine  if  the  target 
image  is  actually  at  some  location  need  be  carried  out  only  at  a  few  locations 
in  the  scene.  This  type  behavior  seems  to  occur  also  in  animal  visual  systems 
where  actual  eye  motion  consists  of  a  sequence  of  jumps  (saccades)  driven 
by  image  content. 

3.  Segmentation  of  FLIR  images 

The  second  application  discussed  in  this  paper  is  the  segmentation  of  po¬ 
tential  targets  in  forward  looking  infrared  (FLIR)  images.  The  motivation 
is  similar  to  that  in  the  previous  application.  Given  an  image  with  a  large 
number  of  pixels,  is  there  some  easy  way  to  locate  the  coordinates  of  the 
most  likely  locations  of  targets  prior  to  performing  complex  target  analysis 
and  identification  procedures? 

Figure  B.6  shows  a  representative  FLIR  image.  It  was  subsequently 
processed  by  correlating  it  with  Gabor  wavelets  using  four  wavelet  orien¬ 
tations:  0°;  45°;  90°;  135°.  The  pitch  and  orientation  of  the  Gabor  wavelet 
modulation  is  important  for  these  images.  A  Gabor  wavelet  can  be  con¬ 
sidered  to  be  an  anisotropic  spatial  filter  and  Gabor  transformation  is  an 
approximation  to  spatially  filtering  an  image. 

Therefore,  it  is  important  to  estimate  the  spatial  frequency  contact  of 
the  targets  of  interest  and  select  Gabor  filtering  frequencies  to  highlight  these 
frequencies.  In  the  case  of  scenes  containing  targets,  approximate  range  is 
frequently  known  as  is  the  approximate  size  of  the  targets.  Therefore,  it  is 
possible  to  estimate  the  angular  extent  of  potential  targets  and  hence  select 
the  appropriate  Gabor  wavelets  for  image  plane  processing.  If  not,  then  the 
images  can  be  processed  with  a  range  of  Gabor  frequencies  and  then  post- 
processed  with  extra-image  information  to  analyze  potential  target  sites. 

The  example  in  Figure  B.7  shows  the  result  of  applying  appropriate 
Gabor  wavelet  functions.  Figure  B.7  has  had  only  one  non-linear  opera¬ 
tion,  namely  the  final  thresholding  of  a  new  image  created  by  simple  addi¬ 
tion  of  corresponding  pixels  in  the  four  Gabor  filtered  images  created  from 
Figure  B.6. 

Figure  B.8  shows  the  same  image  after  Gabor  filtering  with  sine  rather 
than  cosine  modulation.  Notice  this  is  an  effective  edge  finder  without 
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the  disadvantage  of  many  derivative  type  edge  finders,  in  that  there  is  no 
enhancement  of  high  spatial  frequency  noise.  This  is  because  the  spatial 
frequency  components  of  a  Gabor  wavelet  are  grouped  about  a  narrow  band 
of  frequencies  so  that  a  Gabor-filter  is  actually  an  anisotropic  band  pass  filter. 
This  filter  can  be  tuned  to  type  of  edge  of  interest  as  shown  in  Figure  B.8. 
This  image  is  also  the  result  of  final  thresholding  of  the  linear  sumnaation  of 
four  Gabcr  filtered  images. 
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A.  Gabor-like  processes  in  animal  visual  systems 

The  basic  data  input  channels  in  the  mammalian  visual  systems  are  well 
known  [9].  See  Figure  B.9.  There  are  three  basic  processing  centers.  The 
first  of  these  is  the  retina,  a  five  layer  system  in  which  the  optical  image  is 
coded  so  that  it  may  be  transmitted  through  neuron  channels  to  the  brain. 
The  retina  performs  local  contrast  and  average  intensity  normalization,  color 
coding,  motion  detection,  spatial  and  dynamic  range  data  compression,  the 
first  stage  of  log  Z  mapping  of  the  two-dimensional  retinal  image,  and  then 
transmits  local  differential  brightness  data  of  points  in  the  image  compared 
to  a  local  circular  surround  of  image  data. 

These  data  are  mapped  in  a  six  layer  system  in  the  brain  stem  (lateral 
geniculate  nucleus  (LGN)).  Individual  neurons  can  be  instrumented  in  l.G\ 
by  means  of  micro-electrodes,  and  it  is  here  that  we  see  that  many  of  these 
neurons  seem  to  view  the  world  as  though  they  were  Gabor-like  filters 
[6].  The  visual  data  are  then  retransmitted  in  the  form  of  a  log  Z  map  of  the 
optical  image  to  the  primary  visual  center  (VI)  in  the  cortex  and  if  is  here  that 
humans  are  first  aware  of  visual  data.  Needless  to  say,  instrumented  cells  in 
VI  respond  as  though  they  are  components  of  two-dimensional  Gabor-like 
filters.  Therefore,  one  concludes  that  the  apparent  world  is  in  fact  the  real 
world  viewed  through  a  set  of  spatially  distributed  Gabor-like  filters.  These 
can  obviously  serve  as  texture  and  edge  detectors  for  animals  like  us,  as  well 
as  pattern  recognition  machines. 

Several  researchers  have  proposed  models  for  the  mammalian  visual 
system  which  are  based  on  Gabor's  work  (.see  (2,  pp.  n64~,S]  and  (6)).  The 
idea  that  the  human  visual  system  optimizes  the  available  information  in 
both  the  spatial  and  spatial  frequency  domains  makes  intuitive  sense  [1, 
p.  1426].  A  model  of  the  visual  system  which  uses  Gabor  filters  may  help 
resolve  the  long-running  debate  over  whether  the  cortical  (brain)  cells  in¬ 
volved  in  vision  perform  as  local  feature  detectors  in  the  spatial  domain  or 
spatial  frequency  components  of  a  Fourier-like  decomposition  [2,  p.  1160] 
or  both. 

Daugman  modified  Gabor's  one-dimensional  time-frequency  "sig¬ 
nals"  into  two-dimensional  spatial  filters.  The  filters  consist  of  a  two- 
dimensional  sinusoid  (grating  pattern)  multiplied  by  a  two-dimensional 
Gaussian  envelope.  These  filters  were  also  shown  to  have  optimal  joint 
resolution  in  the  spatial  and  frequency  domains  |2,  pp.  1162-4].  Daug- 
man's  two-dimensional  Gabor  filter  is  a  product  of  a  two-dimensional 
sinusoid  and  a  two-dimensional  Gaussian  envelope.  The  general  form  of 
the  two-dimensional  Gabor  filter  family  in  the  space  domain  is: 

r(x,y)  =  exp[-(x^  -t-y^)/2(a'^  -I-  )3^)lsinl -27r(U4,x -I-  V4,y )  -  M'l  (A.l ) 
where  (xy.yy )  are  coordinates  for  the  Gaussian,  «  and  3  are  the  Gaussian 
decay  terms,  U,(,  and  V4,  express  the  modulation,  and  'F  controls  the  phase 
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of  the  two-dimensional  sine-wave.  The  resulting  waveform  is  shown  in 
Figure  B.IO. 

One  example  of  a  visual  system  model  is  given  by  Jones  and  Palmer  [6, 
pp.  1 233-58].  The  hypothesis  in  their  paper  is  that  the  visual  fields  in  a  typical 
mammalian  (cat)  visual  (brain)  cells  behave  as  though  they  were  linear  filters 
having  the  functional  form  of  the  two-dimensional  Gabor  filters.  Jones 
and  Palmer  obtained  the  two-dimensional  spatial  response  and  temporal 
responses  of  36  instrumented  cells  from  cat  cortices.  They  used  a  simplex 
algorithm  to  find  Gabor  filters  which  best  fit  the  response  profiles  in  a  least- 
squared-error  sense.  The  error  between  the  spatial  response  profiles  and 
their  corresponding  two-dimensional  Gabor  filters  were  then  calculated. 

The  study  she  ved  that  33  of  36  spatial  responses  and  34  of  36  temporal 
responses  showed  no  statistical  difference  from  a  Gabor  filter.  The  authors 
concluded  that  the  Gabor  filter  has  appeared  to  evolve  as  an  optimal  strategy 
for  sampling  images  iimultaneously  in  the  two-dimensional  spatial  and 
spatial  frequency  domains  [6,  p.  1233).  The  brain  visual  system,  howe\er, 
is  not  necessarily  a  linear  system,  so  it  is  possible  that  Jones'  and  Palmer's 
data  may  be  a  consequence  of  the  specific  and  simple  test  stimuli  applied  to 
the  cells  rather  than  a  general  and  robust  description  of  the  visual  system. 
Nonetheless,  their  results  strongly  suggest  thatGabor  filtered  images  are  part 
of  the  computational  routines  used  by  vertebrate  animals  to  segment  images 

Other  authors  have  focused  on  texture  discrimination,  which  requires 
simultaneous  measurement  in  both  space  and  frequency  domains  |  Id,  p.  71  ]. 
Turner  approaches  the  texture  discrimination  problem  from  the  aspect  of 
information  representation  [10,  p.72|  If  an  image  is  represented  as  single¬ 
valued  pixels,  global  texture  information  is  not  specifically  demonstrated, 
but  if  a  global  Fourier  transform  is  used,  local  texture  information  is  missing. 
Turner  therefore  developed  a  set  of  spatially  localized  Gabor  filters  and  used 
them  to  segment  textural  features.  His  filters  were  circularly  symmetric 
and  non-self-similar,  that  is  the  Gaussian  envelope  had  fixed  size  but  the 
frequency  of  the  modulated  sinusoid  was  allowed  to  var\'  [  10,  p  74]. 
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B.  Figures 


Figure  B.l:  Typical  photomicrograph  of  a  portion  of  a  VI.SI  chip. 


Figure  B.6;  Typical  FLIR  image  showing  a  tank,  APC,  target  board 
and  truck. 
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Figure  B.2:  Result  of  preprocessing  the  image  in  Figure  B.l. 
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Figure  B.3:  The  result  of  correlating  ("Gabor  transforming")  the 
image  in  Figure  B.2  with  a  two-dimensional  Gabor  pattern.  Note 
that  the  image  is  printed  on  a  512  x  512  pixel  space  and  that  the 
Gabor  patterns  are  34  ■  34  pixels;  the  pitch  of  the  modulation  is  17 
pixels  per  cycle  and  is  phased  as  a  sine  modulation  (to  provide  edge 
enhancement).  The  orientation  is  45°. 
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Figure  B.4;  Result  of  performing  the  non-linedr  min-max  pixel  se¬ 
lection  procedure  on  Gabor  filtered  versions  of  Figure  B.l.  There 
were  six  of  these  images  resulting  from  Gabor  filtering  at  20',  45\ 
70^  110“,  135",  and  I60“. 
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Figure  B.5:  Superposition  of  Figure  B.l  with  the  maximum/ 
minimum  points  thresholded  from  Figure  B.4.  Note  that  these 
points  lie  primarily  on  vias  or  contacts.  They  represent  only  a  few 
percent  of  the  original  512  ■  480  array  of  pixels. 


n 


Figure  B.7:  The  result  of  adding  four  (cosine)  Gabor  filtered  images 
derived  from  Figure  B.6  with  subsequent  thresholding. 
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Figure  B.9:  Basic  input  data  channels  in  the  mammalian  visual 
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a  Several,  hopefully  useful,  observations  concerning  the  topic  in  the  title 
are  discussed:  (i)  It  is  noted  that  the  axiom  of  intersection  is  not  essential  in 
the  definition  of  a  muitiresolution  analysis,  (ii)  Several  conditions,  which 
are  often  easily  verifiable,  are  given  for  scaling  sequences  which  imply  Ihiit 
such  sequences  generate  scaling  functions  whose  supports  give  rise  to  non¬ 
overlapping  tilings  of  71" . 


1.  Introduction 

The  point  of  this  lecture  is  to  communicate  several  observations  which  may 
be  useful  to  investigators  and  other  iitdividuals  who  work  with  multireso¬ 
lution  analyses.  These  observations  concern  two  topics:  one  pertains  to  the 
axiom  list  for  a  multiresolution  analysis  and  the  other  has  to  do  with  the 
characterization  of  certain  scaling  functions 

Recall  that  a  multiresolution  analysis  is  a  sequence  !Vj'.  ;;  of  closed 
subspaces  of  L^l'Tl'' )  which  enjoy  certain  properties,  see  [1,  2,  3,  8,  M],  One 
of  the  properties  is  the  following: 

I'-'! 

1  ‘ 

Property  ( 1 .1 )  is  often  a  nuisance  to  verify.  For  instance  see  !?,  7).  The 
reason  for  this  may  be  the  notion  that  the  property  depends  intrinsically  on 
the  specific  example.  Fortunately  this  is  not  the  case. 

In  this  lecture  we  show  that  property  ( I .  I )  is  a  consequence  of  the  other 
properties  enjoyed  by  multiresolution  analyses.  Thus  its  appearance  in  the 
definition  is  unnecessary  and  redundant. 


t  Partially  suppiirted  by  a  grant  from  the  Air  Force  Office  of  Scientific  Research,  AFOSR- 
90-311. 
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Another  topic  we  will  touch  upon  here  concerns  multiresolution  tinal- 
yses  whose  scaling  functions  are  characteristic  functions.  This  matter  was 
recently  studied  in  [3,  4],  The  characterization  of  scaling  sequences  which 
give  rise  to  such  scaling  functions  is  an  important  problem  in  these  studies. 
Here  we  give  several  conditions  which  imply  that  a  given  sequence  has  the 
desired  properties  and  which  are  relatively  easy  to  verify  for  many  examples. 

2.  On  axioms  for  a  multiresolution  analysis 


2.1.  The  main  observation 

Suppose  .A  is  a  linear  transformation  on  iH"  We  sav  that  A  is  an  ruwpUfh'.r 
dilation  for  T"  if  it  satisfies  the  following  properties: 

■  A  leaves  2''  in\’ariant. 

•  All  the  eigenvalues  Ai  of  A  satisfy  lAi*  .>  i. 

These  properties  imply  that  q  '  det  A!  is  an  integer  which  is  >  2.  In  what 
follows  we  will  always  assume  that  A  is  an  acceptable  dilation  for  2." 

Proposition  2.1.  Suppose  -  is  a  sequence  of  close  subspaces  of  L  ('.K"  1 
which  enjovs  the  following  properties: 

•  f|x|  is  in  Vj  if  and  only  if  f(Ax)  is  in  V, ,  |. 

■  There  is  a  function  in  Vc  such  that  ,ip(x  --  is  a  complete 

orthonormal  system  for  Vo. 

If  P,f  denotes  the  orthogonal  projection  of  f  into  V,  then 

lim  IjPjfl!  =0  (2.1 ) 

j-»-2C 

for  all  f  in  I  ). 

Note  that  (2.1 )  implies 

n  Vj  =-  :01.  (2.2) 


Since  the  properties  enjoyed  by  the  sequence  of  subspaces  in  Proposition  2.1 
are  also  enjoyed  by  all  multiresolution  analyses,  see  [1,  2,  3,  8,  9],  we  may 
make  the  following  conclusion: 

Corollary  to  2.1.  If  >s  3  multiresolution  analysis  then  property  (2.2) 

is  a  consequence  of  the  other  properties  enjoyed  by  {Vjjj^^ . 
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2.2.  Details 


Proposition  2.1  is  an  easy  consequence  of  the  formula  for  the  Fourier  trans¬ 
form  of  Pjf: 

P7f(£,)=  Y.  {^■(^.-27rB>k)<p(B-i£.-27rlc)j<p(B-'£.)  (2.3) 

ke2." 

where  B  =  A*  is  the  adjoint  of  A.  Here  g  denotes  the  Fourier  transform  of  g 
which,  for  an  integrable  function,  is  defined  by 


g(i.)  = 


e  ‘^’‘■^^g(x)dx 
'93" 


and  distributionally  otherwise. 

In  what  follows  we  will  use  the  notation 


per|(g(£.))=  Y  gl^-^TiB’k). 
keZ" 

With  this  notation  (2.3)  may  be  re-expressed  as 

^f(i)  =perj(fU)<(>!B-it))<p(B-‘i). 

To  see  (2.1),  use  Plancherel's  formula,  formula  (2.3),  and  the  fact  that 
Pj  is  an  orthogonal  projection  to  write 


(2Tr)'' 


91" 


perj(f(£,)cp(B-ii.))(p(B-*2,)f(£.)  d^ 


(2.4) 


Observe  that 

perj(f(2,)(p(B-’£,))  $  {perj(|f(f,)|^))'  ^'perj(|(p(B“'i.)!’))’  ^ 
and  since 


perj(l(p(B-'i)l^)  =  1, 

by  virtue  of  the  fact  that  [tplx  -  k)]^  is  a  complete  orthonormal  system 
for  L'^(91'’ ),  we  may  conclude  that 


iPiHi 


(271)" 


91’ 


!perj(q’lf(f,)l 


!2>iI 


■^|q-‘''^<p(B-7)f(f.)|di. 


(2.5 


I 


where  q  =  |det  B|. 

Observe  that  per|(q'|f(f,)|^)  is  essentially  an  approximating  Riemann 
sum  for  |f(£,  -  27rr|)|^dri.  Hence  if  f  is  continuous  with  compact  support 
then  if  j  ^  0, 
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perj(q'|f(i,)|'^)  ^  C  (2.6) 

where  C  is  a  constant  which  depends  on  t  but  not  on  i.  Thus,  in  view  of  (2.5 ! 
and  (2.6),  for  such  f  we  may  write 


C  ^cp(B-'£,)fU)id2.. 


whenever  j  ^  0,  where  C  is  a  constant  independent  of  i.  Now,  if  f  vanishes 
in  a  neighborhood  of  the  origin  sav  'L  :  li|  <  ci,  we  may  write 


I  Iq'  '  -(p(B-’£.)f(£,)|di'.;  ;'f:;'f  q-’i(t)(B-’2.li-’d2:'  -  l2.Si 

J''’" 

and  note  that,  since  1^(2,))^  is  iri  [. '  (fa"  1,  the  integral  involving  (j)  on  the  right 
hand  side  of  (2.8)  goes  to  zero  as  i  -oo.  Thus  from  (2.7)  and  (2.8)  we 
may  conclude  that  (2.1 1  holds  for  all  f  such  that  f  is  continuous,  compiKtly 
supported,  and  vanishes  in  a  neighborhood  of  the  origin.  Since  such  f  are 
dense  in  L‘('2t'’ )  and  1,  we  may  make  the  stronger  conclusion  that 

(2.1 )  holds  for  all  f  in  L"('>t"  ). 


3.  Tiles  and  scaling  functions 


3.1.  Background 

Suppose  A  is  an  acceptable  dilation  for  Z"  and  tk  is  a  collection  of  distinct 
representatives  of  Recall  that  the  number  of  elements  in  A'  is  q, 

where  q  =  'det  A',  and  that 

2."  -  y  ;k  4-  a7."; 
kc:v 

where  the  terms  in  this  union  are  pairwise  disjoint. 

Let 

•X 

Q  —  [x  t  '21"  :  X  =  ^  A  'kj,  kj  c  A).  (3.1 ) 

I  ' 

The  set  Q  satisfies  the  following  properties: 

AQ~U!k(Q;.  (3.2) 

kCIV 

yik+Q;..fB",  (3.3) 

kgZ" 

l'<-i  h  Q)  P]jk2  f  Q1  ~  0  whenever  k I,  ki  fc  .1C  and  k|  ^  kj.  (3.4) 
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Here,  S  ~  T  means  that  |T\S|  =  1S\T|  =  0  where  ISl  denotes  the  Lebesgue 
measure  of  S.  If  Q  enjoys 

C^+QinQ-'^'  forallkinZ'^'Ol,  (3.51 

which  may  be  stronger  than  (3.4),  then  the  characteristic  function  of  Q  is  a 
scaling  function  for  a  multiresolution  analysis. 

In  a  recent  paper  [3]  the  authors  studied  such  scaling  functions  and 
gave  several  conditions  which  are  equivalent  to  (3.5).  Unfortunately,  in 
many  interesting  examples,  none  of  these  conditions  may  be  particularly 
convenient  to  test.  On  the  other  hand,  most  of  the  time  one  is  only  interested 
in  sufficient  conditions  on  A  and  X  to  ensure  that  the  set  Q  satisfies  (3.5).  In 
what  follows  we  give  several  such  sufficient  conditions  which,,  in  appropriate 
cases,  are  relatively  easy  to  verify. 


3.2.  Main  results 

To  avoid  unpleasant  technical  complications  in  what  follows  we  always 
assume  that  OC  contains  0. 

For  any  nonnegative  integer  N  let 

N 

yUk'N  =  ^  A'lK.  (3.6! 

i  0 

Thus  A.TCn  is  a  finite  subset  of  ?■"  consisting  of  ‘ '  sums  k  of  the  form 

N 

lc  =  ^A*ki  (3.7) 

)  c 

where  the  k-^'s  are  in  X.  Let 

•c 

AX^  =  (J  A:Kn  (3.8) 

N  0 

and  let 

■PAic^  =-  a;k:^  (3.9) 

In  other  words  every  element  in  is  a  finite  sum  of  the  form  (3.7)  for 
some  N  and  every  element  k  in  VAX-,:  is  of  the  form 

k-k,  -k2  (3.10) 

where  ki  and  k2  are  in  We  are  now  ready  to  state  the  promised 

conditions. 
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Proposition  3.1.  If  TiAX^  =  Z"  then  Q  satisfies  (3.5). 

Let  b  =  maxllkl :  k  e  X)  where  |lc|  denotes  the  Euclidean  norm  of  k,  let 

a“'  =  supil|A“'x||  :  X  c  and  |x|  =  I), 

and  let  23  =  Jk  e  2."  ;  |k|  <  2ab/(a  -  1)}.  In  terms  of  this  notation  we  may 
state  the  following: 

Proposition  3.2.  If  23  c  then  Q  s'^tisfies  (3.51. 

Sometimes  it  is  possible  to  obtain  an  estimate  of  iQI.  the  measure  of  Q. 
In  such  a  case  the  following  may  be  useful: 

Proposition  3.3.  If  !Qi  <  2  then  Q  satisfies  (3.5). 


3.3.  Examples 

VVe  apply  the  above  results  to  some  of  the  examples  considered  in  [31  where 
they  where  handled  by  verifiving  Cohen's  condition. 

Example  3.4. 

Lot  n  =  1,  A  ^  3,  and  X  ^  :0, 1,5!.  Then  a  3,  b  ^  5,  and  23  ^  k  : 
,k  5!.  Since  0, 1  v'  X"  and  'IVIX'^  =  -li.dX'x,  it  suffices  to  check  that  2, 
3  and  4  are  in  ILdXX  The  identities  2  -  3  -  3  1,  3  ■  .3  1,  and  I  :>  1 
imply  that  2,  3  and  4  are  in  2\4X\  so  that  we  may  applv  Propo'^ition  "'.2  to 
conclude  that  the  corresponding  Q  sati.sfies  |3.3!. 


Example  3.5. 

l.et  n  2, 


and  !K‘  - 


In  this  case  a  \  2  and  b  I  It  is  not  difficult  to  verils  that  it  '.n/Lk  ^ 
Hence  the  corresponding  Q  satisfies  (3.3)  by  virtue  ot  Proposition  3.2. 

Let  u  2, 


A 


In  this  case  it  is  quite  transparent  that  'P.d,X\  -  so  that  Proposition  3  ! 
can  be  applied  directly  to  conclude  that  the  corresponding  Q  satisfies  (3.3). 


Example  3.6. 

Let  n  r  2, 
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and  let  IK  be  the  set  whose  elements  are  the  columns  ot 
/O  1  200  1  224  \ 

Vo  0  0  1  2  2  1  2  4j' 

To  see  that  it  suffices  to  show  that 


is  in  'DyiOC:^:,.  But  this  is  clear  since 


Hence  Proposition  3.1  can  be  applied  directly  to  conclude  that  the  corre¬ 
sponding  Q  satisfies  (3.5). 

Remark  3.7.  Numerical  results  corresponding  to  some  of  the  above  exaniples 
easily  imply  that  IQi  <  1.  For  instance,  in  example  (3.1 1 )  it  is  clear  that  Q  is 
contained  in  a  triangle  of  area  3/2.  Hence  in  this  case  Proposition  3.3  can  be 
applied  to  conclude  that  Q  satisfies  (3.5). 

Example  3.8. 

Let  Ti  =  2, 


let  X\  be  the  set  whose  elements  are  the  columns  of 

/  0  0  0  1  1  \ 

V  0  1  2  1  2  J  ' 

let  OC2  be  the  set  whose  elements  are  the  columns  of 
/O  0  0  I  1  \ 

Vo  1  2  1  -lj’ 

and  let  (K »  be  the  set  whose  elements  are  the  columns  of 

/O  0  0  1  1  \ 

i  0  -4  2  1  --1  j  ■ 

The  corresponding  tiles  are  plotted  in  Figure  3.1,  Figure  3.2,  Figure  3.3. 
Property  (3.5)  can  be  verified  by  applying  Proposition  3.1,  Proposition  3.2, 
or  Proposition  3.3. 
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.dal - 1 - - - - - - - - - * 

.{>2  d  1)2  f'J  f'H  i  I.  14 


Figure  3.1:  Tile  generated  by  Xi  in  Example  3.8. 


V  4  :  l>  lO  (14  1^.6  OK  I  J  -  *4 


Figure  3.2:  Tile  generated  by  in  Example  3.8. 

3.4.  Details 

Proof  (of  Proposition  3.1).  Recall  that  (3.4)  says  that 

Ik,  f 

whenever  kj  and  k2  are  in  yi3Co  ‘ind  k)  /  ic^-  Hence 
A(k,  t  Q)plA(k2  +Q)  ~0 


(3.13) 
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Figure  3.3;  Tile  generated  by  in  Example  3.8. 


for  such  kj.  In  view  of  (3.2)  we  may  write 
Alki  +Q)  =  Aki  +  y  ,'k-^Q; 

Since  the  union  in  the  right  hand  side  of  the  above  identity  is  taken  over 
pairwise  disjoint  sets,  we  may  conclude  that  this  identity  together  with 
(3.13)  imply  (3.12)  whenever  ki  and  k^  are  in  AX}.  By  induction  it  is  clear 
that  (3.12)  is  valid  whenever  ki  and  kj  are  in  .dlK'si  for  any  non-negative 
integer  N.  In  other  words,  since  k  -■  'P.dA'  v  can  be  expressed  as  k  ki  kj 
with  k)  and  ki  in  AIKn  for  some  N  we  may  conclude  the  following: 

Lemma  3.9.  If  k  -5  and  k  0  then 

ik  f  Q;f|g  -0 


This  implies  the  desired  result.  I 

Proof  (of  Proposition  3.2).  If  Br  =  (x  t  93"  ;  |x|  ^  r,  then  a  routine 
estimate  shows  that  Q  c  Br  whenever  r  ^  ab/(a  -  1 1.  Since 

ik+  BrlflBr  ~  H 

whenever  |k|  ^  2r,  we  may  conclude  that  in  order  to  show  that  Q  .satisfies 
(3.5)  for  all  k  in  2)"  it  suffices  to  check  that  Q  satisfies  (3.5)  for  all  k  in  ‘B. 
This,  of  course,  is  the  case  when  B  c  BA.'Koc.  I 
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Proof  (of  Proposition  3.3).  The  result  is  a  transparent  consequence  of 
Theorem  2  in  [3],  We  recall  some  of  the  details.  In  what  follows  Xs  denotes 
the  characteristic  function  of  the  set  S. 

Let  Qn  ,  N  =  0, 1 .2, . . . ,  be  the  sequence  of  sets  defined  as  follows: 

.  Qo  =  f-1/2,  1/21” 

■  Qn  -Uce:cA"'('"TQN-i).  N  =  '.2,... 

Then  the  characteristic  functions  of  Qn  satisfy 

■  Hct""  -  k)  =.  1  a.e. 

■  for  all  functions  ip  which  are  continuous  and  bounded  on  ftf" 

lim  XO\ (’‘)4>(x)dx  -  XQ(x)(p(x)dx 

n-'vj'H  '  IQI  J'll" 


The  last  two  items  imply  that 
^  XqIx  -  k)  ^  I  a.t 


Since  1  <  Q  oc  and  tliesum  in  |.5. 14)  is  integer-valued  for  all  x,  we  may 
conclude  the  following: 

Lemma  3.10.  'Q  is  equal  to  a  positive  integer. 

Hence  the  estimate  iQj  <  2  implies  that 

:Qi-l.  !3.i:)) 

Since  (3.15)  is  equivalent  to  (3.5),  see  Lemma  1  in  (3],  the  argument  is 
complete.  I 


4.  Miscellaneous  remarks 

The  ob.servations  leading  to  Proposition  2.1  were  made  while  I  was  prepar¬ 
ing  a  draft  of  [6]  and  was  confronted  with  the  task  of  verifying  (1.1)  for  a 
particularly  unpleasant  example.  A  review  of  the  literature  indicates  that 
the  general  idea  however  is  at  least  implicit  in  earlier  work  on  the  subject. 
For  example,  a  variant  of  (2.1 )  may  be  found  in  [2]. 

Some  of  the  observations  which  eventually  led  to  Proposition  3.1  and 
Proposition  3.2  were  made  during  a  discussion  with  Stuart  .Nelson  who 
provided  significant  input.  Wayne  Lawton  kindly  provided  me  with  a  copy 
of  [4],  discussed  some  of  the  material  therein,  and  brought  Lemma  3.10  to 
my  attention  via  a  different  argument. 
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E 

I  The  concept  of  innovations  is  introduced  as  the  base  of  the  orthonormal 
representation  of  a  random  priKess  and  the  result  is  used  to  siniplify  the 
estimation  of  the  spectrum  of  an  ARMA  prcxress.  The  ARMA  model  is  con¬ 
ceptually  justified  in  termsof  the  principle  of  maximum  entropy  generalized 
in  the  context  of  entropy  rate. 


1.  Factorization  and  innovations 


In  the  following,  we  present  a  number  of  fundamental  concepts  related  to 
the  orthonormal  representation  of  stochastic  processes  and  we  illustrate  the 
results  with  a  variety  of  topics  of  theoretical  and  applied  interest.  The  paper 
is  mostly  tutorial.  To  make  it  self-contained,  we  review  briefly  the  early 
concepts  [7], 

A  discrete-time  stoclwstic  process  is  a  sequence  x„  or  x[n]  of  random 
variables  (RVs)  defined  for  every  integer  n.  We  shall  assume  that  it  is  a  real 
stationary  process  with  zero  mean.  The  autocorrelation  R[ml  of  x[nl  is  the 
expected  value  of  the  product  x[n  -t-  mlxlml: 

R[ml  =  E{x[n  +  mlxjnlj  (l.l) 

The  power  spectrum  S(e'"')  of  xln]  is  the  discrete  Fourier  transform  (DFT) 
of  Rfm); 


S(ci^)=  21 


R[ml 


In 


r 

J  — n 


S(tu)e”''‘^'  da> 


(1.2) 


The  process  x[n.]is  called  white  noise  if  the  RVs  x(nl  and  x[n  -t-  m]  are  uncor¬ 
related  for  every  m  ^  0,  that  is,  if 


R[ml  =  P6(m|  = 
S(e’‘")  =  P 


P,  m  =  0 
0,  m  /  0 


(1.3) 
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Systetns  A  linear  time-invariant  system  is  an  operator  assigning  to  a  given 
process  xfn]  (input)  the  process  (output) 

OC 

y  [n]  =  ^  x[n  -  k]h[k]  =  x[n]  ♦  h[n.l  (1.4) 

k  — oo 

Thus,  y  [n]  is  the  discrete  convolution  of  x(n]  with  the  delta  response  h[nl  of 
the  system. 

The  z-transform 

OO 

H(z)  =  ^  hfnlz~"  (1.5) 

n  — OC 

of  h[nl  is  the  system  function. 

With  Rxy[ml  =  E{x[n+Tnly[nl]and  Ryylm]  =  E{y(n+m]y[nlJ,itfoIlows 
from  (1.4)  that 

Rxyfml  =  Rxx  *  hf-ml  Ryy  =  RxyM  *  h[ml 
Sxy(i)  =  Sxx(2)H(1/z)  Syy(z)  =Sxy(z)H(z) 


1.1.  Spectral  factorization 

A  function  L(z)  is  called  minimum  phaseif  it  and  its  inverse  r(z)  =  1/L(z) 
are  analytic  for  |z|  <  1: 


L(z)  =  £  Unlz-"  r(z)  =  Ylnlz-"  ( 1 .7) 

II  0  n  0 

If  S(e'“^’ )  is  the  spectrum  of  a  regular  process  x[n.l  satisfying  the  Paley- Wiener 
condition  [4] 


I  In  S(e’‘‘')|  da'  <  oo 


then  we  can  find  a  minimum-phase  function  L(z)  such  that 

S(z)  =  l(z)L(1/z)  (1.9) 

The  determination  of  the  function  L(z)  is  simple  if  the  given  spectrum 
S(z)  is  rational:  S(z)  =  A(z)/B(z).  We  factor  the  polynomials  A(z)  and 
B(z)  and  form  the  polynomials  N(z)  and  D(z)  using  only  the  roots  |z'|  <  1 
(Fejer-Riesz  theorem) 


S(z)  = 


A(z)  N(z)N(l/z) 
B(z)  '  D(z)D(l/z) 


L(z)  = 


(1.10) 
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Example  1.1,  We  wish  to  factor  the  spectrum 


5  —  4  cos  tu 
10  —  6cos  o) 


S(z) 


5-2(z  +  z-’) 
10-3(z  +  z-M 


Clearly, 


S(z) 


2{z-]/2){z-2) 

3(z-l/3)(z-3) 


hence  L(z) 


2z-  1 
3z-  1 


1.1.1.  Innovations 

From  (1.6)  and  (1.9)  it  follows  that  if  x(n.l  is  the  input  to  the  system  r(z) 
(Figure  1.1)  the  spectrum  Sii(z)  of  the  resulting  output  ifnl  is  white  noise: 

Sii(z)  =S(z)r(z)r(l/z)  =  1  RHfml=6fml  (1.11) 

The  process  ifnl  so  formed  is  called  the  innovations  of  xfn|.  Thus, 

ifn]  =  ^  YfWxfTX  -  W  E[i(n  +  mlifnll  =  I  ,  q 


V(')  ♦-•rir) 


r  (s) :  whitening  filter 
L(5);  innovations  niter 


Figure  1.1:  Whitening  and  innovations  filter. 


Cascading  the  system  r(z)  (whitening  filter)  with  its  inverse  I  (z)  (innova¬ 
tions  filter)  as  in  Figure  1.1,  we  conclude  that  the  resulting  output  equals 
xfn).  This  shows  that  xfn)  is  the  output  of  the  filter  L(z)  with  input  i(nl; 

xfn)  =  ^  l[k.lt[n  -  k)  (1.13) 

k  0 

We  have  thus  shown  that  a  regular  process  x(n|  is  linearly  equivalent 
to  a  white  noise  process  ifnl  in  the  sense  that  each  can  be  expressed  linearly 
in  terms  of  the  other  and  its  past,  as  in  ( 1 . 1 2 )  and  (1.13).  This  is  the  extension 
of  the  Gram-Schmidt  orthonormalization  to  stochastic  processes.  We  give 
next  several  applications. 
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2.  Linear  prediction 

Linear  prediction  is  the  LMS  estimation  of  the  present  value  xfnl  of  a  stochas¬ 
tic  process  by  a  linear  function  of  its  past  values.  The  result  is  a  direct  ap¬ 
plication  of  the  projection  theorem  in  Hilbert  space.  In  terms  of  RVs  this 
theorem  can  be  phrased  as  follows; 

We  wish  to  estimate  an  RV  in  terms  of  n  RVs,  x, . x„  (data).  The 

desired  estimate  is  the  sum 

Xo  =  QlX,  ■  4-  anX„  (2.1  ) 

Our  objective  is  to  determine  the  constants  Oi  so  as  to  minimize  the  MS  value 
P  =  E{(xo  -xo)^}  (2.2) 

of  the  estimation  error  X;;,  —  x^,.  Clearly,  P  is  minimum  if 

ap 

—  = -E([x^,  -  (aix,  +■■•■+•  anX„)lXi;  =  0  i=l . n  (2.3) 

This  yields  a  system  of  n  equations  expressing  the  unknowns  q;  in  terms  of 

the  second  order  moments  ElXjXjl  of  the  n  -i- 1  RVs  x^^, _ x„. 

The  system  (2.3)  can  be  written  in  the  following  form: 

Eiexil=0  i=l . n  and  e  =  Xo  -  jo  (2.4) 

This  result,  known  as  the  orthogonality  principle,  states  that  P  is  minimum 
if  the  estimation  error  e  is  orthogonal  to  the  data  Xj. 

Note  that 

Elt'Xol  =0  P  -  E;(x^^  -  jotXj,;  (2..')') 

2.1.  The  Yule-Walker  equations 

Now  we  consider  the  problem  of  estimating  the  present  value  x'nl  of  a 
stochastic  process  in  terms  of  its  N  most  recent  past  values  xln  -  kj.  Our 
estimate  is  the  sum 

N 

Xivjfnl  =  ^  QN.k  xfn  -  k|  (2.6) 

k.  I 

as  in  (2.1 ).  This  is  the  output  of  the  FIR  (finite  impulse  response)  filter 

Hn(z)  =  Qn.i  z  '-(-  ■  ■  +  aN_N  z.”*'*  (2.7) 

with  input  xfn).  To  find  the  coefficients  ON.k,  we  apply  (2.5).  This  yields 
E{(x[nl  -  Xm  [n))x[n  -  ml)  =  0  m  =  l . N 


(2.8) 
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Hence, 

N 

^QN,kR[m-k]  =  R[ml  m  =  I,...,N  (2.9.1) 

k=1 

The  above  is  a  system  of  N  equations  (Yule-Walker)  and  its  solution  yields 
the  N  unknowns  ON.k- 

With  QN,k  so  determined,  the  resulting  LMS  error  Pn  equals  (see  (2.5) 
and  (2.6)) 


N 

Pn  =R[0]-^aN,kR[k]  (2.9.2) 

k^1 

2.1.1.  Levinson's  algorithm 

To  solve  the  system  (2.9)  directly,  we  must  invert  the  matrix  of  its  coefficients 
(covariance  matrix).  This  is  a  Toeplitz  matrix  whose  inversion  can  be  simpli¬ 
fied.  Next  we  present  a  simple  recursive  method  (Levinson's  algorithm  [3]) 
yielding  the  N  -(- 1  unknowns  Pn  and  cn  .k-  Following  the  standard  notation, 
we  set  Kn  =aN,N-  It  follows  from  (2.9)  that 

a,.,  =^  =  K,  P,  =R[01-q,.,R(11  (2.10) 

Suppose  that  we  have  determined  the  N  -  1  coefficients  ON-i.k  and  the 
corresponding  LMS  error  Pn-i  It  can  be  shown  that  [6] 

N-  I 

Pn-,Kn  =RfNl-  Y_  QN-i.kRlN-k)  (2.11.1) 

k  1 

Qn.N  =  Rn  ON.k  =  QN-I.k  — KNUN-t,N-k  1  5;  k  $  N  —  1  (2.11.2) 
Pn  =(1 -K^)Pn-i  (2.11.3) 

The  first  equation  yields  Kn;  the  second  is  used  to  find  the  N  parameters 
QN.k;  the  third  equation  determines  Pn  The  iteration  starts  with  (2.10). 

Note  that  |KnI  ^  1  because  Pn  ?  0.  Thus,  Pn  is  a  decreasing  sequence 
of  numbers  tending  to  a  positive  limit  P. 


2.2,  The  Wiener  filter 

As  N  oo,  the  FIR  predictor  of  x[nl  tends  to  the  predictor 

OO  OO 

xfn)  =  ^  h[klxfn  -  k]  H(2)  =  ^  hlnlz"" 

k  1  n  1 


(2.12) 
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and,  (2.9)  tends  to  the  infinite  system  (Wiener-Hopf  equations) 

OO 

^  hfklRfm  -  k]  =  R[ml  m  ^  1  (2.13.1; 


k  >  1 


P  +  h[k]R[kl  =  R[01 


(2.13.2) 


involving  the  unknowns  hfk]  and  P.  We  shall  solve  the  system  (2.13)  indi¬ 
rectly  using  innovations.  From  the  linear  equivalence  of  the  process  x[n.]  and 
its  innovations  ifn]  it  follows  that  the  predictor  xfn)  of  x[n]  can  be  written  as 
the  response  of  a  linear  filter  Hi(z)  with  input  i[n,l; 


xfn]  =  ^  hifkiifn  -  k)  Hi(z)  -  ^  hifn] 

k  1  n  1 

To  find  hjk],  we  apply  the  orthogonality  principle  (2.4): 

E  I  fxfn]  -  ^  hifklifn  -  kl'j  i(n,-Tn)l  =0  m  ^  1 


(2.14) 


(2.15) 


Since  (see  (1.12)  and  (1.13)) 

E(xfn|ifn  -  ml)  =  Urn)  E(i[n  -  klifn  -  ml)  =  6fTn  -  k] 
(2.15)  yields 

CC 

hifm]  =  Km]  xfml  =  ^  l[kli(n  -  k) 

k  t 

This  shows  that  the  estimate  x(n)  of  x[n]  is  the  response  of  the  filter 


(2.16) 


Ht(z)  =  ^l(k|z-^  =  L(z)-l(0l  (2.17) 

k  I 

to  the  input  ifn]. 

To  complete  the  specification  of  the  Wiener  filter  H(z)  it  suffices  to 
express  ifn]  in  terms  of  xfn).  The  process  ifn]  is  the  response  of  the  whitening 
filter  1/L(z)  to  the  input  xfn).  Cascading  with  H,  (z)  we  obtain  Figure  2.1 


Figure  2.1:  One-step  predictor. 
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Example  2.1.  Suppose  that  x[n]  is  the  process  of  Example  1.1.  In  this  case, 
L(z)  =  UOl  =  lim  L(z)  =  2/3 

32  —  1  z-foo 

H(z)  =  - — ~ — j-  x[nl  =  ^x[n- 1)  +  ^xfn- 11 

6  —  3z“ '  2  6 


2.2.1.  The  Kolmogoroff-Szego  MS  error  formula 

From  (2.16)  and  (1.13)  it  follows  that  the  estimation  error  equals 

e[nl  =  x[nl  —  x[nl  =  U0li[n] 

And  since  E{i^[n]}  =  1,  this  yields 

P  =  E{e^[nl}=  1^(01  (2.18) 

We  shall  express  this  error  directly  in  terms  of  the  power  spectrum  S ( c’ ^ )  = 
|L(e'“')P  of  x[nl.  The  function  In  L(z)  is  analytic  for  |z|  >  1.  From  this  it 
follows  that  il] 

In  1^101  =  ^  f  ln|L(e*^)|^dcu 
hence 

P=exp{^|''  lnS(ei"')dai|  (2.19) 


3.  Spectral  estimation 


A  process  x[nl  is  called  ARMA  (autoregressive-moving  average)  if  its  spec¬ 
trum  S(z)  is  rational  as  in  (1,10): 


S(z)  =  L(z)L(l/z) 


b(i  -l-biz  ^+.,.-t'b\/iz  ^ 
1  +  QiZ“'  .  -t-  OnZ.^*^ 


N(z) 

D(z) 


(3.1) 


In  this  case,  xfn]  satisfies  the  recursion  equation 


x[n.l  -f-  a)x[n  -  1]  -I- . . .  +  ONxln-  N|  =  boifnl  +  . . .  -f  bMiln.  -  M]  (3.2) 

where  ifn)  is  its  innovations.  We  shall  determine  the  N  -i-  M  -I-  1  parameters 
Qi  and  bk  of  L(z)  in  terms  of  the  first  N  +  M  +  1  values  RfO], ....  RfM  +  N)  of 
the  autocorrelation  R(ml  of  xfn]. 

The  process  x[n  -  m]  is  linearly  dependent  on  i(n  -  ml  and  its  past; 
furthermore  l[nl  is  white  noise  with  E{i^Inl}  =  1.  Multiplying  (3.2)  by 
xfn  -  ml  and  taking  expected  values,  we  conclude  that 


Rfml  +  oiR[m  -  11  +  . . .  +  QNRfm  -  N1  =  0  m>M 


(3.3) 
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Setting  m  =  M  +  +  N,  we  obtain  a  system  of  N  equations.  Its 

solution  yields  the  N  unknowns  Ok.  To  complete  the  specification  of  L(z)  we 
need  to  find  its  numerator  N  (z). 

AR  processes  If  M  =  0,  then  x[n]  is  an  autoregressive  process  and 

^0  =  lim^-^co  L(z)  (3.4) 

U(z) 

In  this  case  (see  (3.2)) 


x[nl  +  aix[n-  1]  H - +  QNxfn-N]  =  boi[n]  (3.5) 

and  (3.3)is  reduced  to  the  Yule-Walker  equations(2.9.1)ifwesetQNi,k  =  -Qk. 
Solving,  we  obtain  D(z).  To  determine  the  constant  bo,  we  multiply  (3.5) 
by  i[nl  and  take  expected  values.  This  yields  E(x(nlifn]}  =  boE{l^(nl}  =  bo. 
Multiplying  (3.5)  by  x[nl  and  using  the  above,  we  obtain 

R[ml  +  Qi  R[m  -  1]  -f  •  ■  •  -I-  aNR[m  -  N]  =  bo  (3.6) 


This  completes  the  determination  of  L(z). 

MA  processes  If  N  =  0,  then  xfn)  is  the  moving  average  of  its  innovations: 

x(nl  =  boifnl -( - -t-bMi[n.-Ml  (3.7) 

L(z)  =  bo -E  biz“’ -i - H-bM^”*^  (3.8) 


In  this  case  (see  (3.3)),  R[m]  =  0  for  m  >  M;  hence,  S(z)  can  be  expressed 
directly  in  terms  of  Rim): 


M 

5(6*^“)=  ^  RfmlD(e-‘"'“')  =  lL(c’“’)|^ 

m  — M 


M 

m  ---0 


Thus  to  find  L(z),  it  suffices  to  factor  the  function  S(z)  as  in  (1.9).  This 
method  involves  the  determination  of  the  roots  of  S(z).  We  discuss  later  a 
method  that  avoids  factorization. 

ARMA  processes  Suppose,  finally,  that  xfn]  is  an  ARMA  process  as  in  (3.1 ). 
As  we  have  shown,  the  denominator  D(z)  can  be  determined  from  (3.6)  in 

terms  of  the  N  values  R[M  -E  l| . R(M  -E  N]  values  of  R[ml.  With  Uk  so 

determined,  we  form  the  process  [2] 


y(nl  =  x[nl  -E  uixfn  -  1|  -E  •  •  •  -E  QNx[n  -  N]  (3.9) 

This  is  the  left  side  of  (3.2).  Clearly,  y(nl  is  the  response  of  a  system  with 
input  x[nl  and  system  function  D(z).  Hence, 


Syy  =  S,Jz)D(z)D(1/z)  =  N(z)N()/z) 


(3.10) 
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From  the  above  it  follows  that  N(z)  is  the  innovations  filter  of  yin]  and  it 
shows  that  y[nl  is  an  MA  process.  To  find  N(z),  it  suffices,  therefore,  to  find 
the  autocorrelation  of  y  fn]  and  proceed  as  in  the  MA  case.  Clearly, 

N 

D(z)D(l/z)=  Y.  (3.11) 

in  —  M 

To  determine  p[ml,  we  form  the  product  on  the  left  and  equate  coefficients. 
This  yields 

N 

p[m]  =  ^  Qi;_mak  forjml  N  and  0 otherwise. 

k  m 

Convolving  with  the  inverse  R[m]  of  vve  obtain 

N 

Ryy  =  X!  ~ for  Im.  $  M  and  0 otherwise,  (3.12) 

V.  -N 

The  determination  of  an  ARMA  spectrum  involves  thus  the  following  steps; 

1)  We  find  the  constants  ai.,  solving  the  system  (3.3). 

2)  We  compute  R,,y[ml  from  (3.12). 

3)  We  factor  the  corresponding  spectrum. 

M 

Syy(zl=  X.  RyJmlz-'"  =  N(z)N(1/z) 

in  -  M 

Note  that  the  system  (3.3)  cannot  be  solved  with  Levinson's  algorithm  be¬ 
cause  it  holds  only  for  m  >  M  ^  2. 

4.  Entropy  rate 

Given  a  partition  A  of  a  probability  space  S,  consisting  of  N  events  A,,  vve 
form  the  sum 

N 

H(A)  - -^Piln(pil  Pi  =  P(A,)  (4.1) 

i  1 

This  sum  is  by  definition  the  entropy  of  the  partition  A.  Sincepi  t  *  pn  ^  1 
and  Pi  ^  0,  it  follows  that 
0  $  H(A)  In  N 

The  maximum  is  reached  if  pi  -  ...  --  pn  =  1/N  and  the  minimum  if 
Pr  =  1  for  some  r.  This  justifies  the  use  of  the  entropy  as  a  measure  of 
uncertainty  about  the  occurrence  of  the  events  Ai  in  a  single  trial:  if  pr  =  1, 
then  our  uncertainty  is  zero  because,  almost  certainty,  only  the  event  Ar 
will  occur.  If  Pi  =  1/N,  our  uncertainty  is  maximum.  We  give  next  an 
empirical  interpretation  of  the  concept  of  entropy.  Our  objective,  however, 
is  a  method  of  estimating  the  spectrum  of  a  process  based  on  the  principle 
of  maximum  entropy. 


{  Papoulis 


256  } 

4.1.  Typical  sequences 

In  the  space  Sn  of  repeated  trials,  we  form  the  event 

B  =  {Ai  occurs  n;  times  in  a  specific  order]  (4.2) 

The  probability  of  this  event  equals 

p(B)  =  r;'' (4.3) 

If  we  perform  the  underlying  physical  experiment  n  times  and  the  event  A; 
occurs  Uj  times,  then,  almost  certainly 

Pi  ~  Ui/n  (4.4) 

provided  that  n  is  sufficiently  large.  In  the  space  Sn  there  are  ISI "  sequences 
of  the  form  (4.2).  From  (4.4)  it  follows  that  almost  certainly,  the  elements 
t  €  T  of  the  subset 

T  =  { Ai  occurs  Ui  ~  npi  times  in  a  specific  order (4.5) 

of  B  occur.  These  elements  will  be  called  typical  sequences. 

With  Ui  ~  npi,  (4.3)  yields 

P(t)  ~  p!'’’'  ...Pn‘’"‘  =  c"'”  (4.6) 

Thus,  all  typical  sequences  have  the  same  probability.  Denoting  by  n,  the 
total  number  of  such  sequences,  we  conclude  from  (4.6)  that 

n,P(t)  ~  P(T)  ~  1  n,  (4.7) 

If  the  events  Ai  are  not  equally  likely,  then  H(A)  <  In  N;  hence,  for 
large  n,  nH(A)  <£  nln  N.  From  this  it  follows  that 

Tit  <  =  N" 

Thus,  the  number  of  typical  sequences  is  much  smaller  than  the  number  N  ” 
of  all  possible  sequences  even  though  almost  certainly,  only  typical  sequences 
will  occur  because  the  probability  P(T)  of  their  union  T  is  almost  one.  This 
property  of  typical  sequences  is  important  in  coding  theory  [8]  and  it  gives 
an  empirical  interpretation  of  the  principle  of  maximum  entropy. 
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4.2.  Entropy  of  RVs 

Suppose  that  x  is  a  discrete-type  RV  taking  the  values  x;  with  probability  Pi. 
The  events  A,  form  a  partition  Ax  of  S.  The  entropy  of  this  partition  is  by 
definition  the  entropy  H(x)  of  the  RV  x: 

H(x)  =H(Ax)  =  -^P.lnpi  (4.8) 

i 

From  this  it  follows  that 

H(x)  =-E{lnf(x)}  (4.9) 

where  f(x)  is  a  function  equal  to  Pi  for  x  =  Xi  and  0  elsewhere.  Extending 
(4.9)  to  continuous-type  RVs,  we  defined  the  entropy  of  an  RV  x  similarly: 

'OC 

H(x)  = -E'lnf(x)l  =  -  lnf(x)dx  (4.10) 

'  — oc 

where  f(x)  is  the  density  of  x. 

Example  4,1.  If  x  is  a  normal  RV  with  zero  mean  and  variance  then 

H(x)  =  -E  |ln  — ^  1  =  In  av^  -I-  1/2  =  In  av'lnc 
I  as/In  2a^  J 


Example  4.2.  If  f(x)  =ce''‘  ’'forX  >  0  and  0 otherwise  then  E(cxi  =  1,  hence 
H(x)  =  -E[ln  c  -  ex)  =  -  In  c  +  I 

The  conditional  entropy  of  y  assuming  x  is  by  definition 

H{y|xl  -E'ln  f(y|x)l  =  -  j|  f(x.y )  In  f(y|x)  dx  dy  (4.11) 

This  is  the  measure  of  uncertainty  about  y  assuming  that  x  has  been  observed. 

The  entropy  of  a  random  vector  X  =  lx,,...,x„l  and  the  conditional 
entropy  of  Y  =  fy  ^ . y  ^  i  assuming  X  are  defined  similarly: 


H(X)  =  -E'ln  f(X)l  H(Y1X1  =  -E!ln  f(YlX)l 


(4.12) 
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4.3.  Entropy  rate 

The  ra-th  order  entropy  of  a  stochastic  process  x,,  is  the  entropy  of 
HlXn.Xn^i , . . .  I , )  of  a  block  of  m  consecutive  samples  x„_|^, 

k  ^  of  x,j.  The  ratio  H(x„ _ ,Xn-,„ii)/’^  the  average  un¬ 

certainty  per  sample  in  a  block  of  m  consecutive  samples.  The  limit 


H(x)  =  lim  -^-H|x„ . x,,-,,,) 

m->x  m 

is  the  entropy  rate  of  the  process  x„. 

It  can  be  shown  that  [7] 


14.13) 


H(x)  -  lim  H(x„'x„_, . x„-,„) 

't\  — •  X 


(4.14  i 


Thus  H(x)  is  the  uncertainty  about  the  present  of  x„  assuming  that  its  entire 
past  is  observed 

If  X,,  is  a  normal  process,  then 


H(x  I  ==  In  \  27tc  * 


I 

in 


lnS(e’“')du' 


14.13) 


Note  finally  that  if  x„  is  the  input  to  a  linear  system  with  system  function 
L(il,  then  the  entropy  rate  ffiy  1  if  the  resulting  output  equals 


Hty)  -  K(x) 


1 

2n 


In  nc"‘T 


du' 


/-f.lo) 


I'or  normal  processes,  this  follows  readily  from  (4.13!  because  Su'.c'"') 
S,  (€'"'■ )  I  (c'“'  1  ‘  The  proiif  of  the  general  case  is  more  difficult  1.3]. 


4.4.  The  principle  of  maximum  entropy 

Consider  a  partition  A  consisting  of  N  events  A,  as  in  (4.1 1.  Suppose  that 
we  know  nothing  about  the  probabilities  pi  of  these  events.  The  maximum 
entropy  (MT.)  principle  states  that  in  this  case,  the  unknowns  p,  must  be  such 
as  to  maximize  the  entropy  M(A)  of  A.  Since  pi  ->  •  <  p^  -  1,  this  leads  to 

the  conclusion  that  the  events  A,  must  be  equally  likely  H  prior  information 
about  the  probabilities  p,  is  available,  then  p,  must  be  such  as  to  maximize 
H(A)  subject  to  the  constraints  resulting  from  the  prior  information 

Example  4.3.  We  are  given  a  die  and  we  wish  to  estimate  the  probability  Pi  of 
its  faces.  In  the  absence  of  any  prior  information,  we  conclude  that  pi  -  l/  t>. 
Suppose,  however,  that  the  probability  that  {evenj  shows  equals  0.4.  In  this 
case,  the  constants  Pi  are  such  as  to  maximize  the  sum  pi  In  pi  p^.  In  pt, 
subject  to  the  conditions 

Pi  +  P2  t  ■  +  P(,  I  Pi  +  04  t  Pi,  =  0.4 
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This  yields 


P2  =  P4  =  P6  =  2/15  Pi=P3=P5  =  1/5 

The  empirical  justification  of  the  ME  principle  can  be  expressed  in 
terms  of  the  concept  of  typical  sequences:  the  unknown  constants  pi  must 
be  such  as  to  maximize  the  number  nt  of  the  sequences  formed  with  the 
elements  A,  of  the  partition  A  that  are  likely  to  occur  (see  (4.7)). 

4.4.1.  Constraints  as  expected  values 

We  shall  use  the  ME  principle  to  estimate  the  density  f(x)  of  an  RV  x  under 
the  assumption  that  the  expected  values  Pi  of  n  known  functions  gi(x)  of  x 
are  given: 


E{gi(x)}  = 


gi(x)f(x)dx  =pi  i=1,....n 


(4.17) 


In  this  case,  our  problem  is  to  find  a  positive  function  f  (x)  of  unit  area  such 
as  to  maximize  the  integral 


H(x)  =  - 


f(x)lnf(x)  dx 


(4.18) 


subject  to  the  constraints  (4.17).  It  is  easy  to  show  that  the  solution  to  this 
problem  is  an  exponential: 


f(x)  =  -exp{-Aigi(x) 


Angn  (x)l 


where 


exp{-Aigi(x)  - 


A„g„(x)}dx 


The  n  parameters  A;  are  determined  from  (4.17). 


(4.19) 


(4.20) 


Example  4.4.  Estimate  the  density  f(x)  of  a  positive  RV  x  with  known  mean. 
In  this  problem,  n  =  1 


g(x)=x  E{x}  =  p  f(x)=0  for  X  <  0 
and  (4,19)  yields 

f(x)  =  ~e-^’‘  A  =  IZ  +  p 
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Example  4.5.  Estimate  f(x)  if  E{x^}  =  012.  With  g(x)  =  (4.19)  yields 


Z  = 


1 

s/lnm.! 


Thus,  if  the  second  moment  of  an  RV  is  known,  then  its  ME  density  is  normal 
with  zero  mean. 

The  preceding  results  can  be  readily  extended  to  random  vectors. 


4.4.2.  Spectral  estimation 

Using  the  ME  principle,  we  shall  estimate  the  power  spectrum  S(z)  of  a 
stochastic  process  x„  under  the  assumption  that  the  first  2NJ  +  1  values 

R[ml  =  E(x,^x„  .  ^1  ImKN  (4.21) 

of  its  autocorrelation  are  known.  This  problem  was  solved  in  Section  3 
under  the  assumption  that  S(z)  is  rational  (see  (3.1 )).  In  the  following,  we 
make  no  prior  assumptions.  We  show  that,  under  the  given  constraints, 
the  ME  principle  leads  to  the  conclusion  that  the  process  x„  is  normal  and 
autoregressive. 

In  this  problem,  the  constraints  (4.21 )  are  second  order  moments.  From 
this  it  follows  as  in  Example  4.5  that  x„  is  a  normal  process  and  its  entropy 
rate  equals  (see  (4.15)). 

1  rn 

H(x)  =  In  'Jlne  +  —  lnS(c'‘")dco  (4.22) 

The  maximization  of  the  entropy  of  Xn  of  any  order  is  equivalent  to  the 
maximation  of  its  entropy  rate  H(x).  Hence,  to  solve  our  problem,  it  suffices 
to  maximize  the  integral  in  (4.22)  subject  to  the  constraints  (4.21 ).  Since 


CC' 

S(e’‘^)  = 

m  —  oc 


^  - imu. 

dR[ml  S(c'‘‘') 


we  conclude  differentiating  (4.22)  that  H(x)  is  maximum  if 


aH 

aR[ml 


1  r  ' 

27t  J_„  S(c'‘^’)^ 


dtu  =  0 


(ml  >  N 


(4.23) 


This  shows  that  the  Fourier  series  coefficients  of  the  function  1/S(c’“’)  are  0 
for  |m|  >  N,  hence,  1/5(0'^“)  is  a  trigonometric  polynomial: 


1 

S(e’'^) 


N 

Y_  CmC-* 


in  1 


(4.24) 


To  complete  the  estimation  of  S(z),  it  suffices  to  determine  the  coefficients 
Cn.  We  can  do  so,  using  Levinson's  algorithm  as  in  Section  3. 
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a  The  Method  of  Cauchy  has  been  used  to  extrapolate  a  desired  parameter 
over  a  broad  range  of  frequencies.  This  information  is  generated  using  some 
information  about  the  parameter  over  a  narrow  band  of  frequencies  or  at 
some  discrete  frequency  points. 

The  approach  is  to  assume  that  the  parameter,  as  a  function  of  fre 
quency,  is  a  ratio  of  two  polynomials.  The  problem  is  to  determine  the  order 
of  the  polynomials  and  the  coefficients  that  define  them. 

Tliis  method  can  be  coded  as  a  standalone  program  or  incorporated 
as  part  of  a  larger  program  This  technique  has  yielded  accurate  results 
while  in  use  in  conjunction  with  a  Method  of  Moments  program  and  as  a 
independent  program  in  filtei  "'nalysis. 


1.  Introduction 

In  a  host  of  problems  in  electromagnetics,  it  is  necessary  to  obtain  information 
about  a  system  over  a  broad  range  of  frequencies.  In  most  cases  it  is  not 
possible  to  evaluate  the  desired  parameter  in  closed  form.  The  sixties  saw 
the  development  of  the  Method  of  Moments  to  overcome  this  difficulty.  It 
was  shown  that  the  Method  of  Moments  generated  remarkably  accurate 
solutions  for  a  broad  class  of  problems.  The  later  years  saw  this  method 
being  refined  into  a  popular  algorithm  in  electromagnetic's  research. 

The  Method  of  Moments  is  an  approximation  technique,  which  con¬ 
verts  interactions  of  complicated  bodies  into  a  set  of  smaller,  easily  solvable 
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interactions.  This  method  finds  its  major  advantage  in  the  widespread  use 
of  the  computer.  But  its  major  drawback  lies  in  that  for  broadband  analysis 
the  program  has  to  be  run  at  many  frequency  points.  In  a  large  system  the 
execution  time  may  be  as  long  as  days.  Also  the  memory  requirements  in 
large  systems  can  be  too  much  for  many  available  computer  systems.  Hence 
the  time  required  to  generate  currents  over  a  broad  spectrum  of  frequencies 
may  be  prohibitive.  In  the  laboratory  it  is  not  always  possible  to  make  accu¬ 
rate  broadband  measurements.  This  problem  is  especially  severe  in  the  case 
of  measuring  the  transfer  function  of  a  filter  in  the  stop  band.  In  some  cases 
the  signal  to  noise  ratio  is  too  low  to  be  confident  about  the  measurements 
of  filter  characteristics. 

These  drawbacks  in  current  methods  have  created  a  need  for  a  tech¬ 
nique  that  would  generate  the  required  information  without  using  too  much 
time  and  still  yield  accurate  results.  One  possible  technique  is  the  Method 
of  Cauchy.  The  approach  is  to  approximate  the  currents  as  a  function  of  a 
frequency.  The  function  chosen  is  a  ratio  of  two  polynomials.  The  problem 
therefore  reduces  to  the  determination  of  the  order  of  the  polynomials  and 
the  coefficients  therein.  With  the  polynomial  coefficients  at  hand,  one  can 
evaluate  the  currents  at  an  arbitrary  number  of  frequency  points. 

A  successful  application  of  this  method  would  result  in  saving  signifi¬ 
cant  amounts  of  program  execution  time. 


2.  The  Cauchy  Method 


Let  us  represent  the  current  as  a  ratio  of  two  polynomials.  Hence  the  current 
(H),  as  a  function  of  frequency  (s),  is 


(2.1) 


The  numerator  polynomial  is  of  order  P  and  the  denominator  of  order  Q. 
Hence  we  have  P  +  Q+2  unknown  coefficients.  Cauchy's  problem  is:  given 
H"(sj)  for  j  =  1, . . .,  J  and  n  =  1, . . .,  Nj,  to  find  P,  Q,  A(s)  and  B(s). 

We  need  the  values  of  the  current  and  its  Nj  derivatives  at  frequency 
points  Sj,  j  =  1, ...,  I. 

The  solution  for  the  coefficients  is  unique  if  the  total  number  of  samples 
is  equal  to  the  total  number  of  unknown  coefficients  P  +  Q  -(-  2,  i.e.. 


I 

N  =  2^(Nj-t-l)=P  +  Q-|-2 
i  1 


(2.2) 


From  (2.1 ) 

A(s)  =  H(s)B(s) 


(2.3) 
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Differentiating  the  above  equation  n  times  results  in  the  binomial  expansion. 


n 


A''''(sj)  =  ‘'(si)BMsi) 

i-0 

(2.4) 

where  "C;  =  .  Consider  A(s)  =  o 

Equation  (2.4)  can  be  rewritten  as 

=  i:?obks'‘. 

p  Q 

y~  Aj  n.kUk  =  Yi  ^i.Tt.kbk 

k-0  k=0 

(2.5) 

where 

(2.6) 

n 

Bi.n.k  =^.''CiHl"-‘'(Si)u{tc-i), 

(2.7) 

0 


j  =  1, . . .,  I,  and  n  =  0,  1, . 
Define 

. .,  N j,  and  u(k)  =  0  for  k  <  0  and  1 

otherwise. 

A  =  ^Aj  n.O  1  Aj ,n,  1  1  •  • 

•  ,  Aj^n  .pj 

12.8) 

•  >  Bj.n.o] 

(2.9) 

The  order  of  matrix  A  is  N 

X  (P  +  1 )  and  that  of  B  is  Nx(Q  +  1 

). 

fa]  =  [oc,  Qi  ,02, . . .  .ai-r 

(2.10) 

[bl  =  fbo,b,,b2 . bgr  (2.11) 


Then,  equation  (2.5)  becomes 


(A|  -  B] 


Q 

b 


=  0 


(2.12) 


Now  one  can  do  a  singular  value  decomposition  of  the  matrix  [A|  -  B).  This 
results  in  the  equation: 


[Ul[IlfVl*‘[ 


=  0 


(2.13) 


The  matrices  U  and  V  are  unitary  matrices  and  Z  is  a  diagonal  matrix  with  the 
singular  values  of  [A|  -B]  as  its  entries.  Given  the  number  of  nonzero  singular 
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entries,  we  can  estimate  the  order  of  the  two  polynomials.  Given  these  better 
estimates  of  the  polynomial  orders  one  can  recalculate  the  matrices  A  and  B. 
Now  one  can  rewrite  the  above  equation  as: 


[A|  -  B] 


=  0 


(2.14) 


One  solution  is  to  choose  the  eigenvector  corresponding  to  the  minimum 
eigenvalue.  Since  the  eigenvalues  are  in  general  complex,  the  minimum  is 
defined  as  the  one  with  the  lowest  absolute  value. 

In  a  computer  realization  of  the  Cauchy  method,  this  technique  could 
lead  to  errors  since  we  may  have  multiple  zero  eigenvalues  which  show  upas 
being  only  close  to  zero.  The  desired  solution  would  be  a  linear  combination 
of  the  eigenvectors  corresponding  to  these  near  zero  eigenvalues.  This  is 
specially  true  in  the  applications  to  filter  analysis  because  the  orders  of  the 
filters  are  important.  Choosing  the  orders  of  the  numerator  and  denominator 
polynomials  as  high  values  can  lead  to  errors.  One  way  of  getting  around  this 
problem  is  to  assume  that  qo  =  1  -0.  Now  equation  (2.12)  can  be  written  as 


fA,!  -  B1 


Ql 

b 


=  -Ao 


12.15) 


where  A|  is  the  matrix  A  without  its  first  column,  oi  is  the  column  vector 
of  numerator  coefficients  other  than  the  ao  and  Ao  is  the  first  column  of 
matrix  A. 

Now  one  does  a  singular  value  decomposition  of  the  matrix  iA||  •  B] 
The  resulting  equation  is: 


[uiiilfv'M 


ai 

b 


-Ao 


(2.16) 


where  L  is  the  diagonal  matrix  with  entries  the  singular  values  of  the  matrix 
[All  -  B).  Now  the  solution  can  be  written  as 


--  (VlfL-'llU»|Ao  (2.17) 

Hence  we  now  have  the  coefficients  of  the  polynomials  at  hand.  We  can  now 
approximate  the  current  at  any  frequency  of  interest.  Any  parameter  we  are 
interested  in  can  be  evaluated  from  the  current. 

It  must  be  pointed  out  that  the  Cauchy  method  can  be  used  for  the 
extrapolation  of  a  function  with  respect  to  any  variable.  In  electromagnetics, 
frequency  is  often  the  variable  of  interest. 


Ql 

b 
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3.  Interfacing  with  the  Method  of  Moments 

The  usefulness  of  the  above  method  is  the  ease  with  which  it  can  be  incorpo¬ 
rated  into  a  Method  of  Moments  program.  The  Method  of  Moments  results 


in  a  equation  of  the  form 

[V1  =  [Z][I]  (3.1) 

Differentiating  the  above  equation  with  respect  to  frequency  results  in  a 
binomial  expansion 

[V]'=[Z]'[I]-I-[Z1[I]'  (3.2) 

=>[!]' =  (Z]-’ [[VI' -[Z]'[I]]  (3.3) 

(V)"  =  [Z)"[I1  +  2[Z)'(I1'  +  [Z)[I]"  (3.4) 

=>[I1"  =  [Z]-'  [[VI"  -  2[Z1'[I1'  -  [Z1"[I]]  (3.5) 

In  general, 

[V]''  =  ^nCi[Zl"-Mir  (3.6) 

r  n-l 

^[I1’'  =  [Z1-'  [Vl''-^nCi[Zl'^-‘fn‘  (3.7) 

1=1 


In  the  above  equations,  [V]*"’  is  the  vector  with  each  element  of  [V]  differ¬ 
entiated  with  respect  to  frequency  n  times.  Similarly  [Z]*"'  is  the  matrix 
generated  by  differentiating  each  element  of  the  Z-matrix  with  respect  to 
frequency  n  times. 

Hence,  using  a  Method  of  Moments  program,  we  can  generate  ail 
the  information  needed  to  apply  the  Cauchy  Method.  Each  element  in  the 
solution  [I]  matrix  can  be  treated  as  our  function  H(s).  Given  the  function 
and  its  derivatives  at  some  frequency  points,  one  can  evaluate  the  function 
at  many  more  points. 

4.  The  method  in  filter  analysis 

The  Cauchy  method  can  also  be  used  in  analysis  of  filters  over  broad  fre¬ 
quency  ranges.  This  is  particularly  useful  in  generating  the  stop  band  re¬ 
sponse  given  the  pass  band  response  and  some  stop  band  information.  Also 
one  can  produce  the  pass  band  response  given  some  stop  band  information 
and  a  little  of  the  pass  band  response.  A  filter  response  is  a  ratio  of  two  poly¬ 
nomials  and  hence  lends  itself  easily  for  application  in  a  Cauchy  program. 
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5.1.  With  the  Method  of  Moments 

To  test  the  Cauchy  method,  the  RCS  of  a  sphere  was  plotted  over  a  wide 
frequency  band.  A  program  to  calculate  the  RCS  of  an  arbitrarily  shaped 
body  using  triangular  patching  was  used.  It  was  modified  to  calculate  the 
derivatives  of  currents  as  well.  This  information  was  used  in  the  Cauchy 
subroutine.  Also  the  same  program  was  used  to  calculate  the  RCS  without 
the  Cauchy  method.  The  RCS  of  a  sphere  was  plotted  as  a  function  of  f , 
where  a  is  the  radius  of  the  sphere. 

The  points  chosen  for  the  Method  of  Moments  program  were  between 
A  =  0.8  m  and  X  =  1.4  m  at  intervals  of  0.1  m  Using  the  above  method, 
currents  at  300  frequency  points  in  this  range  were  evaluated. 

The  major  saving  arising  from  the  Cauchy  Method  is  in  execution  time. 
The  time  taken  for  the  above  extrapolation,  as  compared  to  the  time  taken  to 
evaluate  the  RCS  at  ten  frequency  points  in  the  same  range  is  shown  below. 
The  program  was  executed  on  a  VAXstation  3100. 

Method  of  Moments  at  1 0  points:  3  hr  38  min  57.69  sec 
Cauchy  Method:  Ihr50min06.12sec. 

Of  the  time  taken  for  the  Cauchy  program  to  execute,  1  hr 31  min45.14sec 
was  taken  by  the  Method  of  Moments  program  to  evaluate  the  current  and  its 
four  derivatives  at  three  frequency  points.  The  time  taken  for  the  evaluation 
of  currents  at  300  frequency  points  was  just  18  min  20.98  sec. 

Figure  A.l  shows  the  results  from  the  Method  of  Cauchy  and  the 
Method  of  Moments  program.  As  can  be  seen  from  the  figure,  the  ap¬ 
proximation  is  very  accurate  over  this  broad  frequency  range. 


5.2.  In  filter  analysis 

Another  application  of  the  Cauchy  method  is  in  filter  analysis.  A  filter 
transfer  function  was  measured  using  a  network  analyzer.  A  few  of  these 
points  were  chosen  as  inputs  to  a  Cauchy  program.  Two  different  cases 
were  tested.  One  was  the  generation  of  the  pass  band  response  using  stop 
band  information.  The  other  was  the  reverse,  i.e.,  the  generation  of  the  stop 
band  response  using  the  pass  band  information.  In  each  case  a  little  of  the 
unknown  band  response  was  required.  As  seen  from  Figures  A.3  and  A.4 
the  interpolation  and  extrapolation  was  extremely  accurate. 
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Figure  A.1;  RCS  of  a  sphere. 
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a  Infinitely  divisible  piobability  density  functions  on  the  half-line  t  0 
form  a  convolution  semigroup  on  t  $  0,  as  they  describe  sUK'hastic  pro¬ 
cesses  with  stationary,  non-negative,  independent  increments.  A  subclass 
'D  of  such  densities  are  C^'  functions  on  the  whole  t-line  when  extended 
by  zero  for  t  <  0.  Such  functions  may  be  viewed  as  physically  realizable, 
causal,  C''  approximations  to  the  Dirac  6-function,  with  further  positivity 
properties.  The  use  of  such  probe  waveforms  for  system  identification  is 
particularly  advantageous  in  transient  wave  propagation  problems,  where 
the  system's  impulse  response  is  typically  highly  singular.  An  ill-posed  de- 
convolution  problem  must  be  solved  to  recover  the  system's  response;  the 
semigroup  and  positivity  properties  of  the  input  probe  enable  this  decon¬ 
volution  problem  to  be  implemented  as  a  Cauchv  problem  for  a  diffusion 
equation.  This  approach  allows  the  analyst  to  monitor  the  gradual  and 
systematic  development  of  sharp  singularities  in  the  presence  of  noise.  One 
important  context  where  this  theory  applies  is  ultrasonic  flaw  detection  in 
nondestructive  evaluation. 

Perturbations  of  the  originallv  designed  pulse  .shape,  due  to  ampli¬ 
fiers,  transducers,  and  other  interfacing  devices,  may  destrov  infinite  di¬ 
visibility  and  lead  to  waveforms  with  large  negative  oscillations  A  much 
wider  class  of  probe  waveforms  can  be  constructed,  the  class  If,  with  1)  r  If, 
that  includes  such  waveforms.  Moreover,  if  the  perturbed  pulse  lies  in  'B, 
a  simple  linear  transformation  of  the  noisy  output  data  can  be  found  that 
reduces  the  perturbed  deconvolution  problem  to  one  with  a  class  'D  kernel. 
Tlie  search  for  this  transformation  is  accomplished  in  the  Fourier  domain, 
by  comparing  the  perturbed  puLse  with  the  originally  designed  pulse.  The 
practical  significance  of  this  observation  lies  in  enabling  the  experimentalist 
to  correct  for  unintended  effects  of  interfacing  black  boxes  and  rc’cover  a 
tractable  deconvolution  problem.  TTie  prcxredure  is  illustrated  with  a  nu¬ 
merical  experiment. 
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1.  Introduction 


Determination  of  the  impulse  response  of  a  linear  time  invariant  system  is 
an  important  objective  in  many  areas  of  system  identification.  Frequently, 
the  system's  complexity  together  with  incomplete  knowledge  of  its  phys¬ 
ical  characteristics  preclude  an  analytical  calculation,  [4],  In  other  cases, 
the  impulse  response  is  needed  to  infer  unknown  inhomogeneities  or  other 
properties  of  the  system.  One  such  example,  [5],  is  the  use  of  impulse 
responses  for  flaw  size  estimation  and  characterization  in  ultrasonic  non¬ 
destructive  evaluation  of  materials.  In  such  contexts,  the  impulse  response 
may  he  obtained  experimentally  by  pulsing  the  system  with  a  physically 
realizable,  smooth  approximation  to  the  Dirac  6-function.  The  present  syn¬ 
opsis  focuses  on  analytical  considerations  underlying  the  choice  of  probing 
pulse  and  its  impact  on  the  subsequent  deconvolution  problem.  A  detailed 
discussion,  with  further  references  to  applications,  is  given  in  |2]  and  13). 

In  one  idealized  experiment,  an  impulse  of  force  6|t)  is  applied  at  a 
point  X  on  the  surface  of  an  infinite  elastic  plate;  the  output  displacement 
response  at  some  other  point  y,  not  necessarily  on  the  same  side  of  the  plate 
as  X,  is  called  the  dvn/imic  Green's  function  yix.y ,  1 1.  Non-disper.sive  elastic 
wave  propagation  between  the  source  and  receiver  causes  qlx,  y.  t !  to  be  a 
highly  singular  function  of  t  for  fixed  x,i|.  Sharp  features,  including  jumps, 
cusps,  spikes,  and  the  like,  signal  the  arrivals  of  various  rellecled  waves,  and 
characterize  the  object  in  the  test  configuration  x,y.  However,  if  a  smooih 
pulse  waveform,  pit  I,  is  applied  at  x,  the  output  response  at  y  is  given  by 

hit!  b'-oKtlsI  pit  Tlylx.ti.  T I  dr,  t  -•  0.  I.Ii 

Such  convolution  severely  distorts  and  blurs  the  sharp  features  in  ql  t ;,  and 
hit)  cannot  be  used  to  identifv  the  medium  in  that  important  singularities 
may  have  been  smoothed  out.  An  ill  pesol  deconvolution  problem  must 
be  carefully  solved  to  reconstruct  ylt),  given  pit  I  and  the  measured  noisy 
output  h„  1 1 1  in  lieu  ol  hit  |.  Moreover,  a  I’nen  siiiiH'llnii’ss  constraints  on  ui  t 
cannot  be  used  to  stabilize  the  inversion  in  the  presence  of  noise  VX'eaker 
constraints,  such  as  an  a  /’rian  bound,  M,  on  the  I  ^  norm  of  ylt !,  together 
with  an  estimate,  t,  for  the  I  •’  norm  '  h  h„  ",  must  suffice  The  noise  to 
signal  ratio  u'  c  M  I,  used  as  an  adjustable  regularization  parameter 
in  I.T'1 1  below,  is  the  only  a  priori  constraint  in  our  deconvolution  procedure 
As  a  consequence,  although  the  I  •  error  in  the  reconstructed  ylt  1  tends  to 
zero  as  t  f  0,  there  is  no  information  on  the  rate  of  coinergence  |71,  and 
error  bounds  in  terms  of  the  estimated  noise  level  in  b„  1 1  i,  are  not  possible 
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2.  Infinitely  divisible  probe  pulses 

A  way  of  compensating  for  the  lack  of  an  error  bound  on  g  (t )  lies  in  perform¬ 
ing  the  deconvolution  in  slow  motion.  Here,  the  notion  of  infinite  divisibility 
plays  a  key  role.  We  consider  smooth  pulses  p(t)  satisfying 


p(t)ec°°;  p(t)=0,  t^O;  p(t)^0,  t>i 


p(t)dt  =  1.  (2.11 


Such  pulses  represent  one-sided  (or  causal)  probability  density  functions. 
Infinite  divisibility  of  p(t)  requires  the  following  additional  property:  for 
every  positive  integer  m,  there  exists  a  one-sided  density  qm(t)  satisfying 
(2.1 ),  such  that  p(t)  is  the  m-fold  convolution  of  qm(t)  with  itself,  i.e., 

P(t)  ={qm(t)r''  (2-2) 

For  large  m,  qm(t)  is  a  narrow  pulse  concentrated  near  t  =  0,  and 
q,n(t)  approaches  5(t)  as  m  T  oo.  The  inverse  Gaussian  pulse  p (a,  t),  ct  >  0, 
defined  by 

rrp~^ 

p(ff,t)  =  - ^  ^ 

V  47rt’ 

is  an  example  of  (2.1 )  for  which  qm(t)  can  be  written  down  explicitly;  we 
have  qm(t)  =  p(CT/m,  t).  The  pulse  s(a,  t),  for  a  >  0,  given  by 

pO-lp-a^  4i 

s(o'.t)  = - = - ,  t  ^  0,  (2.41 

v/nt 

is  also  infinitely  divisible,  has  much  the  same  shape  as  (2.3),  but  the  m*'’ 
convolution  root  of  s(a,t)  is  not  s((T/m.t).  Although  relatively  few  C' 
infinitely  divisible  densities  can  be  writen  down  explicitly,  a  rich  variety  of 
such  functions  exists,  as  the  convolution  of  any  two  causal  infinitely  divisible 
densities  is  again  causal  and  infinitely  divisible.  While  (2.3)  and  (2.4)  are 
unimodal  pulses  (see  Figure  A.l),  quite  complicated  multimodal  pulses  can 
be  created  by  convolving  (2.3)  or  (2.4)  with  discrete  Poisson  densities.  All 
such  pulses  belong  to  the  class  V  defined  as  follows:  A  one-sided  infinitely 
divisible  density  p(t)  e  V  if  and  only  if  there  exist  positive  constants,  A,c,  (3, 
with  A  ^  1,  and  3  <  1,  such  that 

y2nlp(£,)i  $  Ae“‘ .  (2.5) 


f(i.)  -T|f(t); 


\/2n  J- 


•oc 

{(tlc'*^-'  dt. 
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denotes  the  Fourier  transform  of  f(t).  Thus,  (2.3)  and  (2.4)  respectively 
satisfy 

\/2n|p(CT, £.)!  =  a  =  al\/l,  (2.7) 

and 

e-c  'g-iate  ^  ^  (2.8) 

Note  that  infinite  divisibility  of  p(t)  implies  that  |p(i.)l  >  0  for  all  real  £,.  See, 
e.g.,  [6,  p,  557], 

3.  Deconvolution  of  class  D  probes 

We  now  consider  (1.1)  when  p(t)  €  'D.  If  the  exact  data  b(t)  were  known, 
solving  for  g(t)  would  be  equivalent  to  finding  u(O.t)  in  the  following 
Cauchy  problem: 

3ii/0x  =  Tu,  X  >  0,  t  >  0, 
u(x,0)  =0,  X  ?  0, 

u(l,t)=b(t).  t5  0,  (.Tl) 

where  T  is  a  linear  pseudo-differential  operator  in  the  t  variable  determined 
by  the  input  probe  p(t),  and  given  by 

(Tu)(x,t)  =  |u(x,£.)log[\/2np(£,)l|  .  (3.2) 

Indeed,  Fourier  analysis  of  (3.1 )  gives 

u(x,t)  ’  {[v'I^(f.))'"'b(f,)}.  0^x$l,  t>0,  (3.3) 

which  reduces  to  g(t)  at  x  =  0.  Infinite  divisibility  of  p(t)  ensures  that 
(3.2)  is  well-defined,  while  I2..'7)  gives  the  Cauchy  problem  (3.1 )  a  parabolic 
character.  The  evolution  of  u(x,t)  as  x  decreases  from  x  -  1  to  x  =  0  is 
termed  continuous  deconvolution,  and  represents  the  progressive  undoing 
of  smoothing  cau.sed  by  diffusion. 

In  the  presence  of  noi.se,  bn(t)  replaces  b(t)  on  the  left  side  of  ( 1 .1 ),  and 
a  direct  inversion  is  not  feasible  in  that  error  amplification  overwhelms  the 
reconstruction  process.  However,  Tikhonov  regularization  of  the  ill-posed 
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Cauchy  problem  (3.1)  leads  to  the  following  approximation  for  u(x,t)  on 
0  $  X  ^  1 ,  t  ^  0: 

|p(£,)|'2  +  [oJ^/ln]  J  ■ 

Here,  co  =  e/M  <  1  is  the  L-^  noise  to  signal  ratio.  For  small  fixed  x  >  0, 
v(x,t)  is  a  smooth  approximation  to  the  singular  signal  g(t)  and  represents 
a  partial  deconvolution.  One  has  the  following  'log-convex'  error  bound, 
which,  apart  from  a  factor  of  2,  is  the  best  possible  in  the  norm, 

II  u(x,-) -v(x,  )  IK  0  5;x$l.  (3..'^) 

The  parabolic  nature  of  (3.1 )  can  be  exploited,  [2],  to  obtain  error  bounds 
for  the  partial  deconvolution  and  its  time  derivatives  at  any  fixed  x  >  0, 
in  terms  of  e,  M,  A,  c,  and  |3.  All  of  these  estimates  degenerate  at  x  -  0. 
However,  for  small  e,  one  can  validate  the  sharp  singularities  in  the  total 
deconvolution  v(0,t),  by  observing  their  early  genesis  at  some  x  >  0  and 
following  their  systematic  development  as  x  ],  0. 

An  effective  computational  algorithm  for  obtaining  the  evolution  i>f 
v(x,t)  as  X  i  0  has  been  developed  [2).  The  algorithm  is  based  on  the 
Poisson  summation  formula  and  is  implemented  in  Laplace  transform  space 
using  FFT  routines.  Input  to  the  algorithm  consists  of  time-domain  data; 
namely,  the  recorded  histories  of  the  actual  probe  p(t)  and  of  the  response 
b„  ( t ),  digitized  at  2N  equispaced  points  on  the  finite  interval  iO,  21 1,  with  M 
and  T  sufficiently  large.  The  'optimal'  value  of  the  regularization  parameter 
cu  is  best  found  interactively,  starting  from  a  plausible  first  guess  for  the 
ratio  e/M. 

4.  Perturbations  and  the  class  ‘B 

An  explicitly  known  class  'D  pulse  such  as  (2.3)  or  (2.4)  can  be  synthesized 
as  an  electrical  voltage  using  a  computer-driven  digital  to  analog  (D/ A)  con¬ 
verter.  (It  is  advantageous  to  use  the  lowest  value  of  a  compatible  with  the 
instrumentation  bandwidth).  To  produce  a  dynamic  force  pulse  having  a 
prescribed  time  dependence,  a  high  fidelity  tran.sducer  is  necessary.  1  low- 
ever,  the  electrical  signal  must  first  be  amplified  to  a  level  sufficient  to  drive 
the  transducer.  The  cumulative  effects  of  the  amplifier  and  transducer  may 
result  in  an  actual  mechanical  pulse  q(t)  markedly  different  from  the  ideal, 
narrow,  unimodal  shape,  [1|.  We  will  show  h(>w  to  get  around  this  difficulty 
in  a  large  number  of  cases. 

It  is  an  interesting  fact  that  there  exist  transformations  of  such  proto¬ 
typical  class  T>  pulses  as  (2.3)  and  (2.4 )  that  may  drastically  change  the  time- 
domain  character  of  these  waveforms,  while  preserving  the  non-vanishing 
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property  of  their  Fourier  transforms.  In  particular,  the  distorted  waveforms 
may  develop  large  negative  oscillations  and  cease  to  be  probability  densities 
altogether,  let  alone  infinitely  divisible  ones.  Convolution  of  D  with  the 
one-sided  functions  h(t)  described  below,  represents  only  one  such  class  of 
transformations.  Other  transformations,  linear  or  nonlinear,  may  produce 
similar  results. 

Consider  any  analytic  function  of  the  complex  variable  z  of  the  form 

OO 

A(z)  =  ^  Onz’',  On  real,  ao  /  0,  (4.1) 

n  0 

such  that  for  some  R  >  0, 

0  <  bo  ^  |A(z)|  $  bi  <  OO,  |z|  5  R.  (4,2) 

Let  f(t)  be  any  real  one-sided  function  (including  linear  combinations  of 
Dirac  6-functions)  such  that  v^|f(L)l  ^  R,  and  let 


OC 

h(t)  =  ^an[f(t)r".  (lf(t)r‘'  =  6(t)).  (4.3) 

n  0 

Then  h(t )  is  a  one-sided  function,  and 

0  <  bo  $  \/2n|h(L)l  $  b)  <  oo.  (4.4) 


One  may  also  rescale  the  time  variable  and  form 

OO 

h,(t)  =  ^  a„(0f(0t-\i.)l*'’.  0>O,  4);>0,  (4.5) 

n  0 

while  still  retaining  (4.4).  The  interesting  case  occurs  when  f(t)  includes  a 
finite  sum  ^  Ck6(t  -  tk),  with  ti,  positive  and  Cv.  real. 

With  arbitrariness  in  both  A{z)  and  f(t),  a  bewildering  variety  of  pulse 
shapes  can  be  created  by  iterated  convolutions  of  p(t)  G  T>  with  such  h(t)'s. 
Th'  .'esulting  waveform  is  always  causal  and  C*  on  the  whole  line.  As  a 
si  .pie  example,  consider 


A(z)  =  ",  \  real, 

f(t)  =6(t-  1), 


hA(t) 


=  e 


r  ^6(t-n), 

^  n! 

II  e 


so  that 


(4.6) 


•/in  hAlL) 


=  e 


1 


(4.7) 
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a  =  ^/2,  b  =  yiO,  A  =  -0.5,  n  =  -1.5,  (4.8) 

andlet  s(0.5,t)bethe  unimodal  pulse  (2.4)  with  a  =  0.5,  shown  in  Figure  A.  1. 
Form  successively, 

qi(t)  =  as(0.5,Qt),  q2(tj  =  (qi  ♦  hxKt) 

q,  =bq2(bt),  q4  =  (qs  *  h^Kt).  (4.9) 


Then,  q4  ( t )  is  the  pulse  shown  in  Figure  A. 2. 

Let  Y  =  (4Q^b'^)'''‘,  a  =  2|A  +  u|.  Using  (x.S)  and  (4.7),  we  have 

g-o  '  '  TlUtl'  '  $  V^|q4(£.)|  ^  e"'“  c-'"  (4.10! 


Next,  let  r(v,t)  be  the  inverse  Gaussian  pulse  (2.,?).  We  observe  from  (2.7) 
and  (4.10)  that  if  v  >  (2a  +  2e“'  /Y  ~  0.58,  then 


!p(v.^)l 

:q4(L)! 


(4.11) 


Choosing  v  =  0.6,  we  see  from  (4.11)  that  the  complicated  non-positive 
pulse  in  Figure  A. 2  is  bounded  below  in  Fourier  space  by  the  narrow  inverse 
Gaussian  shown  in  Figure  A.l.  This  relationship  is  shown  graphically  in 
Figure  A.3  where  |q4(L)l“'  (solid  curve),  and  2.5|p(0.6,L)r'  (dashed  curve), 
are  plotted  as  functions  of  discrete  frequency  £,c  =  k7i/lC.24,  k  —  1, . . .,  650, 
using  FFT  routines.  These  considerations  serve  to  motivate  the  following 
definition. 


Definition  4.1.  A  function  q(t)  is  in  class  3  if  and  only  if  q(t)  is  causal 
and  on  the  whole  line,  with  lq(£.)i  >  0  for  all  real  L  and  there  exist  an 
infinitely  divisible  density  r(t)  e  (P  and  a  positive  constant  K  =  K(q,p), 
such  that 


lq(£.)l 


L  real . 


(4.12) 


3  includes  all  functions  of  the  form  q(t)  =  (p  ♦  h)(t)  with  p(t)  e  T> 
and  h(t)  of  the  form  (4.3),  and  we  have  |p(£,)l/lq(£.)l  $  1/bo  In  particular, 
choosing  h(t)  =  6(t),  it  follows  that  T>  c  3.  Other  transformations  of 
p(t)  p  T),  possibly  nonlinear,  may  also  produce  objects  qlt)  c  3. 
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5.  Deconvolution  of  class  03  probes 

We  view  membership  in  03  as  resulting  from  perturbations  of  the  originally 
intended  class  T>  probe,  caused  by  interfacing  devices.  Assuming  the  actual 
mechanical  pulse  q(t)  t  03,  we  use  Fourier  analysis  to  find  p(t)  e  Dsuchthat 
(4.12)  is  satisfied.  As  in  Figure  A.3,  this  may  be  accomplished  by  plotting 
|q(2,)r'  and  adjusting  the  width  parameter  of  the  candidate  pulse  r(t)  so 
that  with  a  reasonable  constant  K,  the  curve  K|p(^,)r'  lies  above  the  q  curve. 
We  then  let  d(£,)  =  p(£,)/q(£,),  and  refer  to  p(t )  as  the  exduin^c  pulse. 
Suppose 

(q  *  q)(t)  =  e(t),  t  >  0,  15.1 1 

where  c(t)  is  the  output  response  that  would  have  been  recorded  iii  the 
e.bsence  of  noise.  Let  e„(t)  be  the  noisv  output  data.  As  before,  we  assume 

'!  q  [[i;  M,  !!  c  -  e„  i|<:  r),  |3.2| 

where  r|  M.  Let 

b(i.)  -  dlDcIi),  h„|L)  ■■■■  dl£.)c„(i.l.  (5..3) 

From  (5.2),  |4.I2|, 

b  b„  :  c,  t  -  Kip  (5.4  i 

Fourier  transforming  (5.1)  and  using  (5.3),  we  see  that  (p  <  q)(tl  ^  b(tl, 
while  b„(t)  is  the  noisy  output  data  corresponding  to  the  exchange  pulse 
p(t)  T>.  Thus,  if  q(t)  e  23,  multiplication  of  the  output  c„(<i.!  bv  the 
bounded  function  d(<L),  reduces  the  deconvolution  problem  to  the  class  'P 
case,  with  bounded  noise  magnification,  c  =  Kp.  With  le  -  c  M,  we  mav 
now  construct  the  family  of  partial  deconvolutions  vfx,  t  )  in  (3.4)  for  which 
the  error  bound  (3.5)  holds. 

6.  A  numerical  experiment 

We  now  illustrate  the  foregoing  development  with  a  numerical  reconstruc¬ 
tion  experiment  using  synthetic  noisy  data.  Figure  A. 4  represents  the  theo¬ 
retically  calculated  impulse  response  (i(t)  of  a  homogeneous  infinite  elastic 
plate,  where  the  source  and  receivers  are  on  opposite  sides  of  the  pslate,  with 
the  receiver  located  at  the  epicenter  [8).  The  sharp  spikes  are  numerical  .x- 
functions,  with  support  equal  to  one  mesh  interval  At,  and  with  height  a/At, 
the  weights  a  being  determined  by  the  physics.  The  spikes,  and  other  sin¬ 
gularities,  indicate  the  arrivals  of  elastic  disturbances  and  their  subsequent 
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multiple  reflections  from  the  plate  faces.  The  drawing  displays  normalized 
displacement  versus  normalized  time  where 

Normalized  displacement  =  tt  x  shear  modulus  x  plate  thickness 

X  actual  displacement/force, 
Normalized  time  =  actual  time 

X  shear  wave  speed/plate  thickness 

The  presence  of  flaws  would  generate  additional  reflections,  resulting  in  a 
signature  different  from  that  shown  in  Figure  A.4. 

The  pulse  q4(t)  shown  in  Figure  A. 2,  with  t  measured  in  normalized 
time  units,  was  used  to  simulate  a  distorted  mechanical  pulse  applied  at  the 
plate  surface.  While  such  distortion  is  substantially  worse  than  is  typically 
the  case  in  experimental  work,  the  robustness  of  the  deconvolution  proce¬ 
dure  is  best  demonstrated  by  considering  extreme  cases.  'i  he  corresponding 
epicentral  response  c(t)  =  (q4  »  g)(t),  evaluated  by  numerical  quadratures, 
is  shown  in  Figure  A.3.  Each  of  g(t),  qaltfand  c(t'„  were  calculated  at  500 
equispaced  points  on  the  normalized  time  interval  [0,51.  Evidently,  there  is 
little  correlation  between  Figures  A. 4  and  A. ,5. 

Next,  noisy  data  Cn(t|  were  constructed  by  adding  to  each  data  value 
y  in  c|t),  a  random  number  drawn  from  a  uniform  distribution  in  the  range 
xO.OOoy.  A  noise  level  of  between  0.1‘/1  and  IVr  is  believed  to  be  representa¬ 
tive  of  experimentai  co..ditions.  The  inverse  Gaussian  shown  in  Figure  A.l 
was  used  as  the  exchange  pulse,  with  corresponding  data  b„(t)  obtained 
from  (5. .51.  With  to  =^.  1.0  ■  10  the  family  v(x,  t )  was  evaluated  using  {.5.4 ) 
at  16  equispaced  values  of  x  on  the  interval  0  S;  x  S;  1.  The  evolution  is 
shown  in  Figure  A.b.  In  that  drawing,  the  first  trace,  in  the  foreground,  is 
b„  ( t  j;  the  last  trace,  in  the  background,  is  the  reconstructed  q(  t ).  Although 
there  is  no  visual  hint  of  spikes  in  the  foreground  trace  b„  ( t ),  early  genesis  of 
these  singularities  and  their  subsequent  svstematic  development  as  x  i  0,  are 
noteworthy  features  in  Figure  A  6  There  is  also  an  easily  assimilated  visual 
relationship  between  successive  traces,  which  facilitates  pattern  recognition. 
These  effects  are  a  reneclion  of  the  diftusit>n  process  associated  with  the 
exchange  pulse  p(t  |, 

We  remark  that  it  is  possible  to  applv  the  deconvolution  algorithm 
directly  to  (q4  •  ql(t!  cl  t ),  foregoing  the  exchange  option,  by  substituting 
q,i  and  c„  for  p  and  b„  in  1.5.4 ).  In  that  case,  the  underlying  Cauchy  problem 
(.5,1 )  is  not  of  parabolic  type  The  resulting  evolution  is  shown  in  Figure  A. 7. 
While  the  last  trace  in  that  drawing  is  a  good  approximation  to  (ill  I,  the 
development  of  singularities  is  not  easily  discernible,  as  the  non-fiositivity 
properties  of  q.t  ( 1 1  result  in  a  tortuous  evolution  of  the  data  c.,  ( t )  into  q  ( t ). 
In  the  presence  of  laws,  where  additional  reflections  can  be  expected  to 
produce  fairly  complex  signatures,  pattern  recognition  may  not  be  feasible. 
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A.  Figures 
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Figure  A.l;  Examples  of  unimodal  pulses  in  class 
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Figure  A.3:  Graphical  idea  behind  Definition  4.1. 
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Figure  A.4:  Response  of  homogenous  elastic  plate  to  Dirac  i- 
function  source,  when  source  and  receiver  are  on  opposite  sides  of 
plate,  with  receiver  located  at  epicenter. 
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Figure  A.5:  Response  to  probe  pulse  of  Figure  A.2  with  test  config¬ 
uration  as  in  Figure  A. 4. 


Figure  A.6:  Continuous  deconvolution  of  response  in  Figure  A.5 
after  using  exchange  option. 
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Figure  A,7:  Continuous  deconvolution  of  response  in  Figure  A  S 
without  use  of  exchange  option. 
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a  This  work  is  about  spatiotemporal  rnndom  fields  and  their  applications  in 
environmental  research.  Ordinary  and  generalized  random  fields  are  stud¬ 
ied,  and  certain  important  classes  of  space  nonhomogeneous/timr  non- 
stationarv  random  fields  are  derived.  Results  are  obtained  regarding  the 
optimal  estimation  and  simulation  of  such  fields  in  space  and  time 


1.  Introduction 

This  presentation  studies  spatiotemporal  natural  processes,  that  is,  processes 
which  develop  simultaneously  in  space  and  in  time.  In  Section  2  we  discuss 
the  emergence  of  spatiotemporal  natural  processes  in  various  branches  of 
physical  sciences  and  address  the  fundamental  hypotheses  and  problems 
regarding  the  c]uantitative  description  of  such  processes.  Several  practical 
issues  of  spatiotemporal  data  analysis  and  processing  are  presented  and  the 
variety  of  potential  applications  is  reviewed.  The  latter  is  followed  bv  a 
critical  discussion  of  the  inadequacies  of  previous  works  on  the  subject 

In  order  to  proceed  with  the  rigorous  mathematical  modelling  of  nat¬ 
ural  processes  which  change  in  space  and  time,  one  must  elaborate  on  a 
theory  of  spatiotemporal  random  field  {S/TRf').  This  theory  is  presented  in 
Sections  3  through  6.  The  preceding  mathematical  results  act  then  as  the 
theoretical  support  for  the  discrete  parameter  representations,  as  well  as  the 
optimal  space-time  estimation  and  simulation  methods  which  are  discussed 
in  a  more  practical  context  in  Sections  7,  8  and  9. 


t  The  rvscarvh  was  sngjiurtisl  in  parts  by  the  R.J.  Reynolds  Foiwi,  University  ol  North 
C  anilina  at  C  hapel  Hill. 
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2.  Spatiotemporal  natural  processes 

Spatiotemporal  processes,  that  is,  processes  which  develop  simultaneously 
in  space  and  time,  occur  in  nearly  all  the  areas  of  applied  sciences,  such 
as:  hydrogeology  (e.g.,  water  vapor  concentrations,  soil  moisture  content); 
environmental  engineering  (e.g.,  concentrations  of  pollutants  in  environ¬ 
mental  media — water/air/soil/biota);  climate  predictions  and  meteorology 
(e.g.,  variations  of  atmospheric  temperature,  density,  and  velocity);  and  oil 
reservoir  engineering  (e.g.,  porosities,  permeabilities  and  fluid  saturations 
during  the  production  phase). 

In  this  context,  important  issues  include:  the  assessment  of  the  spa¬ 
tiotemporal  variability  of  the  earth's  surface  temperature  and  the  prediction 
of  extreme  conditions;  the  asse.ssment  of  space-time  trends  in  runoff  on  the 
basis  of  a  spatially  and  temporally  spar.se  data  base;  the  estimation  of  the 
soil  moisture  content  at  unmeasured  locations  in  space  and  instants  in  time; 
the  reconstruction  of  the  whole  field  of  a  climate  parameter  using  all  the 
space-time  data  efficiently;  the  study  of  the  transport  of  pollutants  through 
porous  media;  the  elucidation  of  the  spatiotemporal  distribution  of  rainfall 
for  satellite  remote-sensing  studies;  the  optimal  sampling  design  of  nieleo- 
rological  ob.servations;  and  the  simulation  of  oil  re.serv  oir  characteristics  as 
a  function  of  the  spatial  position  and  the  production  time. 

The  issues  above  are  parts  of  the  general  problem  I'f  a)uih/<i<  iiiht  pio- 
ccssiny  of  data  from  si>act’-limc  f'hifsiail  pht'iiomciia.  In  all  these  situations,  the 
spatiotemporal  pattern  of  change  of  the  natural  processes  involved  possesses 
a  certain  structure  at  the  macroscopic  level  and  a  purely  random  character 
at  the  microscopic  level.  The  latter  implies  a  significant  amount  of  uncer¬ 
tainty  in  spatiotemporal  variation.  Moreover,  this  variation  is,  iri  general, 
space  nonhomogeneous  and  time  nonstationary  (there  may  exist  complex 
trends  in  space,  time  varying  correlation  structures,  significant  space-time 
cross-effects,  etc.).  Spatiotemporal  variability  plays  an  extremely  substantial 
role  in  the  understanding,  modelling  and  prediction  of  surficial  processes 
in  space-time.  It  is,  also,  very  important  in  improving  our  basic  knowledge 
regarding  the  climatological  influences  on  the  hydrogeology  of  a  region.  If 
neglected,  spatiotemporal-parameter  variability  of  water  management  mod¬ 
els  may  adversely  influence  management  decisions. 

Typically,  space-time  data  analysis  and  processing  problems  have  been 
handled  under  some  convenient  but  rather  simplistic  assumptions.  In  hydro¬ 
geology  and  xoater  resources  rcixearch,  common  statistical  methods  of  analysis 
create  artificial  decompositions  of  hydrologic  processes — one  in  space  and 
one  in  time — and  study  them  separately  (25, 10);  or  focus  on  time  averages 
(monthly,  seasonal,  annual)  of  the  hydrologic  parameters;  or  make  addi¬ 
tional  assumptions,  like  space  homogeneity  and  weak  time  dependency 
(e  g.,  (4]).  The  multivariate  analysis  concept  which  has  been  used  in  a  num- 
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ber  of  hydrologic  problems  (e.g.,  [11])  accounts  for  the  vector  formulation 
of  the  scalar  time  series  model,  where  the  component  time  series  are  cor¬ 
related  to  each  other.  Variability  in  space  is  not  taken  into  consideration 
and  the  modelling  of  the  combined  evolution  of  theses  series  in  space  and 
in  time  is  clearly  not  an  issue  addressed  by  multivariate  analysis.  Similar 
decompositions  has  been  applied  on  some  recent  studies  on  the  assessment 
of  Ireland's  wind  power  resource  (e.g.,  [17]).  Moreover,  classical  statistics 
and  time-series  methods  have  failed  to  provide  a  conceptual  framework  de¬ 
termining  the  correlation  structure  of  the  spatiotemporal  heterogeneity  of 
soil-water  properties  from  local  to  global  scales. 

In  environmental  research  the  existing  models  (e.g.,  (2,  151)  either  app!} 
traditional  methods  of  classical  statistics  which  are  incapable  of  capturing 
important  features  of  the  space-time  structure,  or  have  been  designed  to  han¬ 
dle  problems  that  are  significantly  different  in  nature  than  those  arising  in 
the  spatiotemporal  data  analysis  and  processing  context  considered  above. 
In  particular,  the  class  of  classical  statistics  models  does  not  determine  any 
law  of  change  of  the  environmental  parameters,  and  the  relative  distances  of 
the  sample  locations/instances  over  space-time  do  not  enter  the  analysis  ol 
the  correlation  structure.  The  second  class  of  environmental  models  avail¬ 
able  concern,  either  specific  space-time  interaction  systems  where  the  input/ 
output  physical  parameters  are  treated  at  each  spatial  location  as  separate 
time  series,  or  the  description  of  the  system's  transfer  function  by  means  of 
some  special  space-time  patterns.  These  models  do  not  provide  an  adequate 
quantitative  assessment  of  spatiotemporal  variability  in  generd,  and  they 
do  not  account  for  the  space  nonhomogeneous  and/or  time  nonstationary 
characteristics  of  the  environmental  parameters  in  particular.  In  some  re¬ 
cent  environmental  studies  the  spatio-chronological  order  of  the  data  is  not 
properly  considered,  and  arbitrary  but  not  well  justified  decompositions  ol 
the  correlation  functions  arc  assumed.  Moreover,  optimal  reconstruction 
schemes,  which  are  general  enough  to  cover  the  majority  of  applications 
have  not  been  developed;  see,  e.g.,  comments  made  in  [28];  also  by  Bilonick 
[3];  and  by  Rouhani  and  Hall  [26]  in  a  geostatistical  framework.  Space-time 
models  which  are  based  on  the  distributed  parameter  concept  [30]  are  not 
in  general  appropriate  for  most  environmental  problems.  These  models 
are  assumed  to  be  governed  by  a  differential  equation  of  a  particular  form 
that  does  not  represent  adequately  the  majority  of  the  spatiotemporal  natu¬ 
ral  processes  of  interest;  issues  of  stability,  controlability  and  observability 
involve  serious  difficulties. 

In  reservoir  characterization,  space-time  data  processing  does  not  exist  at 
present.  Most  of  the  techniques  available  exclusively  account  for  the  spatial 
variation  of  geological  reservoir  processes,  when  in  reality  these  processes 
are  simultaneously  a  function  of  spatial  location  and  production  time  (e.g., 
(20)).  Also,  current  practices  in  data  collection — with  the  exception  of  some 
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oil  sand  deposits — do  not  account  for  time.  The  reasons  that  space-time 
models  do  not  exist  at  present  in  reservoir  characterization  is  due  to  the  fact 
that  the  need  for  detailed  and  advanced  reservoir  characterization  has  been 
recognized  only  recently. 

The  methods  used  for  statistical  climate  modelling  and  prediction  are  usu¬ 
ally  somewhat  primitive  versions  of  the  methods  used  for  zoeather  analysis 
and  prediction  (e.g.,  [12,  21,  31)).  Many  of  them  suffer  the  same  limitations 
with  the  methods  used  in  hydrology.  For  example,  the  basic  ansatz  of  mul¬ 
tivariate  techniques  such  as  "principal  oscillation  pattern"  and  "principal 
interaction  pattern"  [31]  is  based  on  the  arbitrary  assumption  that  the  space- 
time  characteristics  of  a  low-order  system  are  the  same  as  those  of  the  full 
system.  Also,  important  issues  such  as  the  characterization  of  spatiotempo- 
ral  intermittency  or  spottiness  in  rainfall  as  it  pertains  to  various  notions  of 
scaling  as  well  as  the  physically  observed  features  of  clustering,  growth,  and 
decay  of  convective  cells,  and  larger-scale  spatiotemporal  forms  observed 
in  mesoscale  rainfall  systems  cannot  be  addressed  by  the  existing  statistical 
methods  (see,  e.g.,  [9]). 

In  global  xoarming  research,  aspects  of  current  interest  are  as  follows; 

1)  Many  eminent  authors  claim  that  while  one  certainly  cannot  assert 
that  no  warming  occured,  the  existing  statistical  analysis  of  earth's 
surface  temperature  data  is  unable  of  providing  adequate  asses.sments 
regarding  temperature's  space-time  variability  and  it  does  not  lead 
to  convincing  arguments  supporting  the  concept  that  changes  at  the 
macroscopic  level  are  due  to  greenhouse  warming  rather  than  to  space- 
time  natural  variability  (e.g.,  [21)). 

2)  In  water  resources  management  the  existence  of  a  warming  trend  raises 
the  question  whether  the  global  warming  has  been  sufficient  as  to 
translate  into  a  corresponding  change  in  the  spatiotemporal  structure 
of  runoff  series.  Again,  current  statistical  analyses  of  runoff  series  are 
subject  to  serious  question  given  that  they  are  based  on  observations 
relating  to  a  spatially  and  temporally  sparse  data  base  and  they  as¬ 
sume  no  model  about  the  underlying  spatiotemporal  evolution  of  the 
runoff  series. 

Clearly,  the  temperature  data  in  Item  Dand  the  rainfall  series  studies  in 
Item  2)  above  are  typical  examples  of  analyses  where  the  theoretical  models 
used  are  incapable  to  provide  adequate  representations  of  the  spatiotempo¬ 
ral  variability  and,  hence,  they  cannot  give  satisfactory  answers  to  crucial 
questions  concerning  climate  and  water  resources  problems. 

The  main  reasons  for  such — clearly  inadequate  from  various  view¬ 
points — analyses  of  spatiotemporal  data  should  be  attributed  to  the  fol¬ 
lowing  facts:  (i)  the  importance  of  spatiotemporal  variability  in  the  study  of 
space-time  phenomena  was  not  fully  appreciated  until  recently;  and  (ii)  most 
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of  the  theoretical  tools  and  mathematical  techniques  of  data  processing  avail¬ 
able  have  been  designed  to  operate  exclusively  in  time  (time  series  methods; 
e  g.,  [16])  or  exclusively  in  space  (random  fields,  geostatistics;  e.g.,  [22,  32, 
33]).  Undoubtedly,  the  literature  on  the  subject  of  applied  space-time  data 
analysis  and  processing  is  very  limited  and  most  aspects  of  importance  in 
the  analysis,  modelling  and  estimation  of  spatiotemporal  parameters  have 
not  been  studied  adequately. 

In  view  of  the  foregoing,  the  following  conclusions  are  drawn: 

1)  Any  modelling  assumption  should  reflect  adequately  the  macroscopic 
and  microscopic  evolution  characteristics  of  the  underlying  processes 
over  space  and  time.  The  latter  is  a  requisite  for  the  understanding 
and  prediction  of  spatiotemporal  processes  in  hydrogeology,  climate 
modelling  and  environmental  pollution  monitoring  and  control. 

2)  Due  to  the  random  character  in  the  variability  of  the  data  at  the  micro¬ 
scopic  level,  these  processes  must  naturally  be  described  stochastically; 
the  concept  of  randomness  should  be  viewed  as  an  intrinsic  part  of  the 
space-time  evolution,  and  not  only  as  a  statistical  description  of  possi¬ 
ble  states. 

3)  The  proper  model  should  be  capable  of  assessing  quantitatively  any 
space  nonhomogeneous/time  nonstationary  variability  features  and 
to  provide  efficient  solutions  to  practical  problems,  such  as  space-time 
estimation. 

Taking  these  issues  into  account,  it  seems  quite  reasonable  that  the 
concept  of  an  S/TRF  is  the  appropriate  stochastic  model  for  spatiotemporal 
processes.  Within  the  framework  of  the  S/TRF  model,  space  and  time  form  a 
combined  process  having  simultaneous  and  interrelated  effects  on  the  evolu¬ 
tion  of  the  natural  variable  it  represents.  Suitable  methodological  hypothe¬ 
ses  and  operational  tools  assure  that  the  mathematical  concept  of  S/TRF  is 
compatible  with  the  physics  of  the  variate  it  describes  and,  thus,  it  is  applica¬ 
ble  in  practice.  Lastly,  conclusions  regarding  the  spatiotemporal  variability 
(trends  in  space,  periodicities  in  time,  nonhomogeneous/nonslationary  cor¬ 
relations,  etc.)  can  be  established  in  terms  of  duality  principles  that  relate  the 
mathematical  notions  and  the  physical  behavior  of  the  process  they  model. 
Here,  stochastic  spatiotemporal  correlation  functions  provide  the  means  for 
structural  inferences. 

In  general,  the  objectives  of  spatiotemporal  data  analysis  and  process¬ 
ing  are:  (a)  to  assess  quantitatively  the  spatiotemporal  variability  of  the 
natural  processes  of  interest  (degree  of  regularity,  continuity,  non-  homoge¬ 
neous  spatial  features,  nonstationary  characteristics,  etc  );  and  (b)  to  provide 
efficient  and  computation.ally  attractive  procedures  for  deriving  optimal  (in 
a  well  defined  mathematical  sense)  and  physically  meaningful  estimation 
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maps  of  the  natural  process,  at  unknown  points  in  space  and/or  instants  in 
time,  based  on  fragmentary  space-time  data. 

Of  course,  the  outcomes  of  space-time  data  analysis  and  processing 
may  be  not  an  end  in  themselves.  Several  important  consequences  will 
emerge  in  the  context  of  earth  sciences.  More  specifically: 

1)  A  deeper  understanding  of  the  physics  of  the  space-time  processes 
will  be  obtained.  For  instance,  knowledge  about  the  spatiotemporal 
variability  of  the  various  climate  parameters  will  improve  our  basic 
understanding  of  how  the  global  climate  actually  functions. 

2)  The  predictive  capabilities  of  many  computer-based  differential  equa¬ 
tion  models  in  hydrology  and  environmental  research,  are  limited  be¬ 
cause  the  parameters  of  the  models  are  difficult  to  determine.  Much  of 
this  difficulty  may  stem:  (a)  from  the  spatiotemporal  variability  of  the 
media,  and  (b)  from  identifiable  differences  in  initial  physical  assump¬ 
tions.  It  is,  hence,  of  significant  importance  to  understand  how  (a)  and 
(b)  influence  the  outcomes  of  modelling. 

3)  Space-time  data  analysis  and  processing  will  provide  the  necessary 
means  for  solving  important  problems  in  various  areas  of  water  re¬ 
sources.  Information  about  the  spatiotemporal-parameter  variability 
of  a  water  resource  system  will  allow  the  detailed  simulation  of  the 
system  and  will  influence  considerably  management  decisions.  The 
assessment  of  the  spatiotemporal  variability  of  pollutaiu  concentra¬ 
tions  will  provide  the  knowledge  needed  to  monitor  and  control  envi¬ 
ronmental  pollution.  S/TRF  simulations  of  the  anticipated  effects  on 
surface  temperature  due  to  the  increase  of  carbon  dioxide  in  the  atmo¬ 
sphere  over  a  specific  time  period  will  provide  valuable  insight  into 
the  study  of  global  warming  issues.  In  connection  to  this,  the  possible 
effects  of  the  coupled  increase  of  precipitation  and  temperature  on  the 
hydrology  of  a  particular  region  can  be  determined;  then,  conclusions 
could  be  derived  about  the  incorporation  of  climatic  changes  into  the 
planning  of  future  earth  systems,  and  the  modification  of  the  operating 
rules  of  existing  water  resource  systems. 

A  S/TRF  is  termed  continuous  parameter  or  discrete  parameter  ac¬ 
cording  to  whether  its  space-time  arguments  of  an  S/TRF  take  continuous 
or  discrete  sets  of  values. 

3.  Ordinary  spatiotemporal  random  fields 

3.1.  The  basic  space-time  notions 

Let  s  =  (.Si  S2 . s„j  t  iH"  (iH"  IS  the  Euclidean  space  of  dimensionality 

n  ^  1)  with  Isi  =  ,  sf,  and  t  e  T  (T  C  9t',  (,  =  {t  t  fR'  :  t  $  0)). 
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In  the  Cartesian  product  x  T  let  (s,t  £  iH"  x  T)  denote  space-time 
coordinates,  such  that  |(s,t)|^  =  |sp  -I- 1^.  We  also  define  (_s,  t)-’*'  =  s-t^^  = 

s“' . . .  s“''tP,  where  P  is  a  nonnegative  integer  and  a  =  (ai ,  a2 . «„ ) 

is  a  multi-index  of  nonnegative  integers  such  that  |a|  =  Y.?  i  = 

ailwi!  ...an!. 

We  define  some  spaces  of  spatiotemporal  functions  X(s,t)  in  x  T, 
which  are  useful  within  the  framework  of  the  present  study;  The  space  C 
of  all  real  and  continuous  space-time  functions  with  compact  support  (i.e., 
they  vanish  outside  some  bounded  region).  The  space  K  of  all  real,  contin¬ 
uous  and  infinitely  differentiable  functions  in  space  and  time  with  compact 
support.  The  space  S  of  all  real,  continuous  and  infinitely  differentiable 
functions  which,  together  with  their  derivatives  of  all  orders,  approach  zero 
more  rapidly  than  any  power  of  1/|(s,  t)|  as  |(s,t)|  ^  oo.  Notice  that  S  3  K, 
as  all  functions  in  K  vanish  identically  outside  a  finite  support,  whereas  those 
in  S  merely  decrease  rapidly  at  infinity.  Spaces  Kand  S  are  of  particular  im¬ 
portance  in  this  study.  The  topology  in  K  and  S  is  in  the  sense  of  [27]  where, 
in  view  of  the  aforementioned  space-time  considerations,  the  argument  is 
now  (s,t)  €  fH"  X  T. 


3.2.  Definition  of  ordinary  spatiotemporal  random  fields 

Let  (O,  F,  P)  be  a  probability  space,  where  Q  is  the  sample  space,  F  is  the  a- 
field  of  subsets  of  Q  and  P  is  the  probability  measure  on  the  measurable  space 
(n,F)  satisfying  Kolmogorov's  axioms;  let  z  =  (s,t).  We  denote  by  = 
L2(n,  F,  PI  the  Hilbert  space  of  all  continuous-parameter  random  variables 
X] , . . . ,  x,n  defined  at  z, , . . .  ,  z,„  and  endowed  with  the  scalar  product 


|X|,X2)  =  E[X|X2l  = 


XIX2  dF„(xi,X2) 


(3.1 : 


where  Fx(xi,X2)  denotes  the  joint  probability  distribution  of  the  random 
variables  xi  and  xi,  while 


||x||^  -  Lixl" 


X'^  dFJx)  <  oo. 


(3.2) 


where  F,j(xl  denotes  the  probability  distribution  of  x.  Usually  F\(x) 
fx(Xi  .X2)  i>re  assumed  to  be  differentiable  so  that  they  can  be  replaced  by 
the  probability  densities  fx(x)  and  fx(xi.X2). 


Definition  3.1.  The  ordinary  S/TRF  (OS/TRF)  Xjs,  t)  is  defined  as  the 
function  on  the  Cartesian  product  93"  x  T  with  values  in  the  Hilbert  space 
L2(D,P,F),  viz. 


X;93"  X  T  -a  12(0, F,P) 


(3.3) 
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Just  as  for  purely  spatial  RF  (SRF),  a  S/TRF  X(z)  =  X(s,  t)  is  specified 
completely  by  means  of  all  finite  dimensional  probability  measures  |4x  ( B ) 
associated  with  the  families  of  random  variables  xi , . . . ,  Xm  at  z, , . . . ,  z,„; 

|4x(B)  =  =  P[(xi,...,Xn,)  G  B)  for  every  B  e  3’" 

(3*^  is  a  suitably  chosen  a-field  of  subsets  of  iR")  and  all  m  =  1,2,...  The 
corresponding  probability  density  functions  are  written  as 

fx(Xl.---.Xm)dXl  ...  dXm  =fz, . 2,„(Xl.---.Xm)dXl  ...  dx.n 

P[xi  ^  x(z,)  $  Xi  +  dxi,...,Xin  ^  x(z„)  $  Xm  +  dXml.  (3.4) 

for  all  m.  All  OS/TRF  to  be  considered  will  be  continuous  in  the  mean  square 
sense,  i.e.,  E|X(s',t')  —  X(s,t)|^  — >  0,  when  s'  — >  s  and  t'  -4  t.  Moreover, 
OS/TRF  are,  in  general,  taken  to  represent  space  nonhomogeneous/time  nonsta¬ 
tionary  natural  processes  (e.g.,  spatiotemporal  history  of  soil  shear  stresses 
during  an  earthquake,  oil  reservoir  porosity  distribution  in  space-time  dur¬ 
ing  the  production  phase).  The  space  of  all  continuous  OS/TRF  will  be 
denoted  by  K. 

In  the  sequel  we  will  consider  second-order  OS/TRF,  that  is,  the  anal¬ 
ysis  will  be  based  on  second  order  statistical  moments  which  are  assumed 
to  be  continuous  and  finite.  More  precisely,  an  OS/TRF  X(s,  t)  will  be  char¬ 
acterized  in  terms  of  its  spatiotemporal  mean  value 


mx(s,t)  =  EfX(s,t)l  = 


X^(x)dx. 


(3.5) 


the  centered  spatiotemporal  covariance  function 

Cx(s,t;s',t')  =  E[(X(s,t)  -  mx(s,t))(X(s'.t')  -  Tnx(s',t'))] 


=  j(Xi  -  nil  )(X2  -  m2)fx(xi.X2)  dxi  dx2  (3.6) 

and  the  spatiotemporal  semivariogram  or  structure  function 
Yx(s,t;s'.t')  =  3E[X(s.t)  -X(s'.t')l'^ 


=  2  JJfXi  -X2)^fx(xi.X2)dxi  dx2.  (3.7) 

A  continuous  function  Cx(s,t;s',t')  is  the  covariance  function  of  an 
OS/TRF  if  and  only  if  it  satisfies  the  nonnegative-definiteness  condition 

rn  k  I  m  k  t 

^  II  Xi  II  ‘lMSi'j'Cx(Si,tj;Si,,tj.)  ^  0 

i  1  )  I ,  i'  I  j'  1' 


(3.8) 


{ 
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for  all  m,  let,  (—  1,  2,  ...),  all  (s^.tj)  t  9^"  «  T  and  all  numbers  (real  or 
complex)  qii,  qej.;  here  kj  denotes  the  number  of  time  instants  tj,  j  -  lu^t. 
. . . ,  k;  used,  given  that  we  are  at  the  spatial  position  Sj. 

Instead  of  the  centered  covariance  function  one  may  also  define  the 
non-centered  spatiotemporal  covariance  function 

EfX(s,t)X(s'.t')l 

=  c\(s,  t;  s',t')  -  m<(.s,t)Tii,j(s',t'l. 


The  other  mode  of  second-order  analysis  is  that  in  the  frcquctiai  doiv.am 
The  harmonic  expansions  of  X(s,tl  can  be  considered  as  an  extension  in  the 
space-time  context  of  the  relevant  results  forSRF  (e.g.,  [7]).  In  particular  (for 
simplicity's  sake,  the  symbol  '2t"  T  under  the  integrals  will  be  omitted  in  the 
following) 


X(s,t) 


exp[i(w  •  g  f  At  liXlyy,  A|  du;  dA, 


(3.101 


where  i  =  ^ind  X(u',  A)  is  the  .so-called  apectrol  <jmplitude  of  Xis,  t '. 
The  corresponding  f^pectral  density  function  C^lyy,  A;  w'.  A')  is  defined  by 

c\(s,  t;s'.  I'l  ■  [[[  [ expHliv  •  s  ^  w'  •  s'  +  At  •  A't')' 


Cx.('.v.  X;  w'.  A' I  dw  dA  dw' dA',  (.Til ! 


where  Cx()y,A; yy'.A')  is  a  positive  summable  function  in  Ot"  •  1,  The 
Cx(w, A;w', A')  forms  an  hi"  •  T-fold  Fourier  transform  pair  with  the  spa¬ 
tiotemporal  covariance  cjs,t;,s',t'). 


3.3.  Space  homogeneous/time  stationary  spatiotemporal  random  fields 

An  OS/TRF  X(s,  t ),  (s,  t )  e  nt"  .  T  will  be  called  space  homogeneous/time 
stationary  in  t/ie  strict  sense  if  all  the  multidimensional  probability  densities 
are  invariant  under  the  translation  z  ►  z  f  6z  (where,  as  before,  z  ^  (s,  t )): 

Pfxi  ^  ^  Xl  ^  ^Xl . Xm  ^  X(2„J  Xm  f  dx.rt!  - 

Pfxi  ^  x(Zi  -hSz)  ^  Xl  ^Xl . Xm  5;  x(z,„  +  6z) 

Xm  t^Xmlt  [3.12) 


or 
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for  all  m  =  1 , 2, . . .  Space  homogeneous/ time  stationary  RF  occur,  for  exam¬ 
ple,  in  the  case  of  blackbody  radiation  within  a  large  cavity  maintained  at  a 
constant  temperature. 

An  OS/TRF  X(s,t)  will  be  called  space  homogeneous/time  stationary 
in  the  wide  sense  if  its  mean  and  covariance  do  not  change  under  a  shift  of 
the  parameters,  i.e., 

rrixls.t)  =  constant  (3.14) 

and 


Cx(s,t;s',  t')  =  i\(h,T),  (3.1.5) 

where  h  =  s  -  s'  and  t  =  t  -  t'.  In  other  words,  there  exist  in  the  closed 
linear  subspace  H  spanned  by  the  random  variables  x  in  L^IO,  P,  F)  a  group 
of  unitary  operators  U|,,x  such  that 

UM,.X(s,t)  -  ,X(s,t).  (3.16) 

where  s.h  t  DF'  and  t,T  6  T  (here,  Sh.tXIs.i)  =  X(s  +  h,  t  t)  is  the 
shift  operator).  It  is  easily  seen  that  in  the  case  of  space  homogeneous/ time 
stationary  fields  the  covariance  (3.6)  and  the  semivariogram  (3.’l  are  related 
by  (assuming  a  zero  mean  field) 

c\(h,Tl  c\Ic\0)  -  Yx(h,T).  (3.1~) 

The  set  of  all  space  homogeneous/time  stationary  ordinary  fields  will  be 
denoted  by  iK',!  (V. 

The  space  homogeneous/time  stationary  RF  X(  s,  t )  admits  the  Fourier- 
Stieltjes  representation 


Xls.tl 


exp(i(w  s)  f  At)  clXxI'V,  A|, 


(3.18) 


where  S,(u',A)  is  a  random  field  such  that  C\(n',A)8(w  v\‘')C>(A  A') 

EfdS^(w,A)dXxi\''',  A' jl  where  CxIu’.A)  is  the  spectral  function  satisfying 
the  spectral  representation  of  the  covariance  c\(h.T),  viz. 


Cx(b.T) 


exp[i(yv  ■  h  +  At)1Cx(w,  A)  dvydA, 


(3.19) 


and 


exp[  i(u'  ■  h  +  AT)|cx(h,T)  dh  dT. 


(3.20) 


Since  the  covariance  c*  (b,  t)  is  a  nonnegative-definite  function,  accord¬ 
ing  to  Birchner's  theorem 
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for  ?11  vv,  A. 

The  c\|  h,T)  will  be  termed  spaco-tinie sep/>rable  if 

c\(h,T)  =  c\(h)c\(T).  i,5.22i 

Clecirly,  this  implies  CJu'.A)  -  CJu)C,(A).  When  physiaillv  justified, 
separability  is  an  extremely  convenient  properly  from  a  mathematical  poinl 
of  view. 

In  practice,  one  usually  makes  an  additional  assumption,  namvK’  that  oi 
space  isotropic/ time  stafionary  RF:  the  covariance  and  spectral  function.'-  are 

Cv!K,T!  -c\(r,TK  '.TJ.' 


and 


C\(u:,Al  C\(a',  A), 


.24 


where  r  -  4y  and  to  -  ;u:  • 

In  order  that  c\(r,T!  be  a  cxnariance  function  of  a  space  isoUx>pic 
time  stationary  RF',  it  is  necessary  and  sufficient  that  this  1  unction  admits  a 
representation  of  the  forni 


Cx(r,T) 


27t'' 


I  n  -  2.  flO'y) 

UerV"  4  ^ 


expiiAT^e" 


I 


CvUe.Aldu’dA. 


where  C\(u’Al  r  0  on  the  half-plane  la-.  A),  ce  ■  O.-x.),  A  .  i  x.  xd.  A 

similar  condition  holds  true  in  terms  of  the  semivariogram  in  (,T17l. 

Other  combinations  of  spatial  homogeneity  and  t  niporal  stationantv, 
in  the  strict  or  the  wide  sense,  are  also  possible  17],  Lastly,  the  space-time 
covariances  satisfy  relationships  similar  to  those  for  purely  SRF;  for  example, 
for  a  time  stationary  S/TRF  (in  the  wide  sense)  it  is  valid  that 

|Cx(s, s'-.T);  i:  v^Cx(s,s;OlCx(.s',s'-,0). 

4.  Generalized  spatiotemporal  random  fields 

4.1.  Definition  and  basic  properties 

In  dealing  with  space  nonhomogencous  and/or  time  nonstationary  natural 
processes  if  will  be  useful  to  introduce  the  notion  of  generalized  S/TRF. 
The  latter  is  an  extension  in  the  space-time  context  of  the  notion  of  random 
distribution  due  to  Ito  [18]  and  Gel'fand  [13].  Let  Q  be  some  specified  linear 


space  of  elements  q  and  let  'h'.;  IjKJ.f,  1'  be  tlie  llilbeit  ^pace  el  all 
random  \ariables  xiq)  on  Q  endowed  with  the  scalar  prodiu  t 


!  X  ( q  I  ) ,  X  i  q  j  ! )  1  \  ( ci  i  I  x  ■  q  j  i 


\i\.’  dP\i  X.’ 


1,1 


U’here  F!X  i .  X'J  i  denotes  the  joint  probabilite  liistributimi  oi  the  random  \  an- 
ables  xlqi  !,x(q.’ I  with  xlq'  ’  t  \iq'-'  •  and '-aii-.tX  mq  tiie  toll;  nvmq, 

linearity  condition 

x(f\,q.]  fx.xc. 

W,  1  ,1 

tor  all  q,  •  Q  and  all  (real  tir  complex'  number-  \  ,  i  \  I  lie 

elements  c;  ■  Qareinft-i'  ■  l,ii'.,t;  c  .Xmonr.thi  e’ e^  ii  a  h  e'* 
suitable  for  the  purpose  of  this  stud\,  are  'hi-  -'p.u  es  K  and  S  ot  Sei  Uou  ;  ' 


Definition  4.1.  A  qenera/i/ed  /'K‘/  <(.>  r/x/ t  on  t,i  X  c  w  thi  r.u,di':r 
mapping 

X  ;  g  •  1  ;iO,  1,  P'.  I  ’■ 


The  tlS/TRl-  considered  will  alwavs  assumed  to  I'e  continuous  in  ilie 
sense  that  I  Xin,,'  Xlqi-’  -dwhenq.  •  t;  The  sei  oi  .ill  conliniuiiis 

C;S/TKI'  on  Q  xsill  be  di.-noted  by 

The  second  order  ch.iractiTistics  o(  the  C .S.  TKI-  art  ihe  -^iMtuUanpoml 
mean  \a/iie 


Tnjqi  f'X(q)'  jxdl\lxl.  I'l'" 

where  f\(x)  denotes  the  probability  distribution  ot  X(q|,  and  the 

Cxlqi.qj!  F  l(X(qi  1  -  m  Joi  l')(X(qj)  mjq.p 

which  will  be  called  the  (centered)  spatiotemporal  cownuince  function, il  of 
the  C'iS/TKf-  X(qi.  both  the  mean  and  the  covariance  functional  will  be 
a.ssumed  to  be  real-valued  and  continuous  relative  to  the  topology  of  Q. 
Also,  a  useful  second-order  characteristic  is  the  spatiotemporal  structure  or 
semivariagr,iin  functional  which  is  defined  by 

Yx(qi."'2t  -=  )  -X(q2)!‘.  (4.b) 

Finally,  mathematically  equivalent  space-time  second  order  functionals  may 
be  constructed  in  the  frequency  domain  by  taking  the  Fourier  transform  of 
the  covariance  and  the  semivariogram  functionals. 
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4.2.  Continuous  linear  functional  representation 
of  generalized  spatiotemporal  random  fields 

In  the  sequel  we  will  concentrate  on  GS/TRF  which  are  of  the  continuous 
linear  functional  form  (CLF) 

X(q)  =  (q(s,t),X(s,t))  =  q(s,t)X(s.t)  dsdt,  (4.7) 

w’here  q  €  Q  and  X(s,t)  is  an  OS/TRF  in  the  sense  of  Definition  3.1  above. 
Depending  on  the  choice  of  the  function  q,  the  CLF  (4.7)  may  admit  a  variety 
of  physical  interpretations.  Let  us  consider  the  following  example. 

Example  4.2.  Assume  that  X(s,  t)  represents  the  concentration  of  an  aerosol 
substance  in  the  atmosphere.  By  choosing  q(s,  t)  =  6(s  -  s'  )6(t  -  f  ),  (4,7) 
gives  the  value  of  the  substance  at  the  point/instant  s,  t.  If  one  let 


q(s,t)  = 


1,  if  s  e  V and  t  <E  ftt,t2l 
0,  otherwise. 


(4.7)  provides  the  total  amount  of  substance  in  the  volume  V  during  the  time 
period  [ti.tjl. 

Since  a  GS/TRF  X(ql  cannot  be  assigned  values  at  isolated  points/ 
instances  (s,t)  (unless  q  is  a  delta  function),  we  introduce  the  following 
field. 

Definition  4.3.  A  convoluted  S/TRF  (CS/TRF)  is  defined  as  the  S/TRF 
Yc,(s.t)  =  (q(s',t'),Sj,,X(s',t'); 


q(s',t')Ss.,X(s'.v  Ids'dt'. 


We  can  now  make  the  following  observations:  The  CS/TRF  (4.8)  is 
characterized  by  V<|(0,0|  =  X(q)  for  all  q  <:  Q.  Also,  it  holds  true  that 
Yq(s,t|  Ss,iX(q)  =  X(S.  s  .., )  for  all  q  e  Q  and  all  (s,t)  c  tH"  ■  T,  The 
space  3C  of  OS/TRF  may  be  considered  as  a  subset  of  the  space  S  of  GS/TRF, 
viz.  :K  c_  S-  Moreover,  the  fields  X(q)  and  Y,,(s.t)  have  certain  important 
properties,  as  follows. 

Property  4.4.  The  means  and  covariances  of  X(q)  and  Y<,(s,t)  write 

mjq)  -  EiX(q)l  =  (mx(s,t),q(s,t)),  (4.9) 


tny(.s,t)  t.|Y,|(s,t)l  (ms,  ,xls',l').q(s',t')). 
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Cx(qi.q2)  -  E[(X(q,)-mx(qi))(X(q2)-  rax(q2))] 

=  ((Cx(s,t;s',t'),qi(s,t)),q2(s'.t')),  (4.11) 

CY(s,t;s',t'l  E  [(Y,|,(s.t)  -  mY(s,t))(Yc|_,(s'.t')  -  mY(s',t'))l 
=  {(Cx(Ss.,X(s''.t").Ss-,rX(s"'.t"')),q,(s",t")), 


qi(s'''.t"')).  (4.12) 

Themeansand  co  variances  of  theCS/TRFand  CS/TRF  are  linearly  related  to 
those  of  the  corresponding  OS/TRF.  From  (4.10)  and  (4.12)  we  find  that  the 
corresponding  mean  values  and  covariances  functions  write,  respectively. 

nijq)  =  mYlO.O),  (4.13) 

Cx(qi,q>i  Cy(0,0;0,0).  (4.14) 


Property  4.5.  Thecovariance  functional  of  theC;S/TRFX(q)  is  a  nonnegative- 
dcfinite  bilinear  funclioiwl  in  the  sense  that 

Cx(q,qi  ( :X(q  1  -  m^(q I"-’ )  >0  (4.131 

for  all  q  •  Q.  Conversely,  every  continuous  nonnegative-definite  bilinear 
functional  Cx(qi .  q2 1  in  Q  is  a  covariance  functional  of  some  GS/TRF  X(q ). 

Property  4.6.  The  fields  X(q  1  and  Y<|(s,  t )  are  always (/(/fci'en/ia We,  even  when 
Xl.s,  t )  is  not.  To  see  this  assume  Q  =  K  and  let 

X’*'-"  (ql  iq(.s,t),X'‘’-"'(.s,t)\ 


ql.s.tlX**^  1,s,t)d.<;dt, 


(4.16) 


where  i'.  is  a  nonnegative  integer  and  p  -  (pi ,  P2 . p  '  is  a  multi-index  of 

nonnegative  integers;  i.e.,  the  superscript  (p,c)  deiu'tes  partial  differentia¬ 
tion  of  the  order  p  in  space  and  differentiation  of  order  c.  in  time 


X'P.‘-'(s_t)  U‘‘’-‘'X(.s,t) 


^ipi 


.  .  . 


Pm 

14 


0-^ 

0t‘' 


X(.<;,t) 


(4.17) 


TT 
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where  p  =  |p,  C(  =  ,  Pi-  By  applying  integration  by  parts  (4.16)  writes, 

=  (-l)'^“^X(q'H'^').  (4.18) 

Similarly  for  the  CS/TRF, 

Ycr"’(s,t)  =  (-l)‘^'‘^S,,,X(q'e’"’).  (4.19) 

Therefore,  although  there  may  exist  no  X'-'‘''  as  such,  we  can  always 
obtain  X’-'‘'’fq)  and  Y*,-  ''  in  the  sense  defined  above. 


Property  4.7.  By  applying  the  Riesz-Radon  theorem  in  terms  of  generalized 
functions  we  find  that  the  mean  m<(q)  can  be  written  as 

Tn-xlq)  =  (  X.  X.  9'-'^'(s,t),  fp,^(s,t)\  .  (4.20) 

where  "v  and  u  are  nonnegative  integers,  ql-s,  1 1  p  Kand  fp,.;(s,  t)  are  ccintin- 
uous  functions  in  KR"  :■  T,  only  a  finite  number  of  which  are  different  than 
zero  on  any  given  finite  support  U  of  K.  Integration  by  parts  yields 


mjqi 


'fp\i'  (s,‘).d(s.tl 


14.211 


A  similar  expression  may  be  derived  for  the  mean  m-,  (.s,  1 1  of  Ypis,  1 1,  namely 
m>(.9.t)  -  /  ^  ^(•-l)'’''^S,,.,fp';''  (s',t'),q(s'.t')\  .  (4.22) 

\  P  ■  V  <1  U  / 

For  convenience  in  the  subsequent  analysis  let  us  put  Op,.  (s.  1 1  f','’/  (-s,  t ). 

Closely  related  to  Property  4.7  is  the  following  section. 


4.3.  Space  homogeneous/time  stationary 

generalized  spatiotemporal  random  fields 

A  GS/TRF  X(q),  q(s,t)  t  Q,  (s,t)  tR"  -  T  will  be  called  space  homo¬ 
geneous  /  time  stationary  in  the  wide  sense  if  its  mean  value  nixlul  and 
covariance  functional  Cx  ( q  i ,  q.; )  are  invariant  with  respect  to  any  shift  of  the 
parameters,  that  is 


mx(q)  -  mx(Sii,Tq)* 

|'i.23 

^  ( Sh .T q  1 1  Sh .T q^ ) t 

(4.24 

i 
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for  any  (h.T)  €  9^"  x  T.  Clearly,  when  the  X(q)  is  space  homogeneous/ 
time  stationary  the  Cx  (q  i ,  q2 )  is  a  translation-invariant,  nonnegative-definite 
bilinear  functional  on  Q,  and  the  following  proposition  can  be  proven  [6]. 

Proposition  4.8.  If  X(q)  is  a  space  homogeneous /time  stationary,  GS/TRF 
on  Q,  there  exists  one  and  only  one  generalized  functional  Cx  ( q  i ,  q2 )  €  Q' 
such  that  (X(qi),X(q2))  =  Cx(qi.q2),  di-di  €  Q- 

We  shall  denote  by  So  the  set  of  all  space  homogeneous/ time  stationary 
generalized  fields.  Note  that  3Co  C  So  C  S-  Similarly,  the  CS/TRF  Yq(s,t)  is 
called  space  homogeneous/ time  stationary  if 


mv(s,t)  =  constant, 

(4.25) 

CY(s,t;s',t')  =  Cylh.T), 

(4.26) 

where  ii  =  s  —  s',  t  =  t  —  t',  for  any  (h.r)  €  91"  x  T.  In  view  of  (4.22) 
and  condition  (4.23)  it  follows  that  the  functions  fp,c(s,t)  are  constants. 
Therefore 


gp.c(s.t)  =  f;,‘’;"’(s,t)  =  o 
=  fo^'o'^'is.t)  =  m 
and  the  mx(q)  will  have  the  form 


for  all  p  ^  1 

for  p  =  C  =  0 


mx(q) 


q(s.t)  ds  dt  =  Tn{q(s,t).  1). 


(4.27) 


(4.28) 


Thecxlqi.di)  €  Q' can  be  expressed  in  terms  of  the  corresponding  Cx(h.T) 
as  follows 


Cx(qt.d2)  =  (Cx(h,Tl.q,  *  q2(h,T))  =  Cx(qi  *  q2).  (4.29) 

for  all  di ,  q2  €  Q.  where  »  denotes  convolution  and  denotes  inversion  (i.e., 

q2(h.T)  =  q2(--h,  -T)). 

Example  4.9.  Let  us  define  in  R’  x  T  a  zero  mean  Wiener  S/TRF  W(s,t), 
s  €  fsi ,  S2I,  t  €  fO,  00)  as  a  Gaussian  S/TRF  with  covariance  function 

Cx(s,  t; s',  t')  =  min(s  -  sj.s'  -  S2)min(t. t').  (4.30) 

The  X(s,t)  =  *  will  be  zero  mean  white  noise  S/TRF  with  co- 

variance  function 


Cx(h,T|  =  aih.-r), 


(4.31) 


{ 
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where  h  =  s  -  s',  t  =  t  -  t'  and  6(h,  t)  is  the  spatiotempora)  delta  function. 
The  corresponding  GS/TRFX{q)  =  (X(s,t),q{s,t))  has  covariance 

Cxldi  -di)  =-  6(q,  t  q,).  (4.32) 

The  above  results  can  be  generalized  to  more  than  one  spatial  dimen¬ 
sion.  More  specifically,  one  may  define  in  93"  x  T  the  so-called  Brownian 
sheet  VV(s,  t)  whicii  is  a  zero  mean  Gaussian  S/TRF  with  covariance 

Cw(s,t;s',t')  =  min(si ,  s', ) . .  .min(s„,s'jmin(t,  t').  (4.33! 

Brownian  sheet  has  important  applications  in  the  context  of  stochastic  partial 
differential  equations. 

In  the  light  of  the  Fourier  transform  properties  of  generalized  functions, 
it  is  valid  that  Cj(qi,q2)  (Co,qi  •  q2)  =  (4>.did2)/  which  yields  the 
following  result  (see  also  [6]). 

Proposition  4.10.  LetX(q)beaGS/TRFin91"  ■  T.  The  covariance  functional 
writes 

Cx(q).q2)=  qi  (w,A)q2(w.A)  d(l)(w,X),  (4.34) 

where  qi(w,A)  and  q2(vy,A)  are  the  Fourier  transform  of  the  qits.tj  and 
q2(s,  t)  respectively,  and  4) (w,  A)  is  some  positive  tempered  measure  in  93"  • 
T.  In  this  case  the  4i(lv,  A)  is  called  the  spectral  measure  of  theC.S/TRF  X(q ). 

Example  4.11.  Consider  once  more  Example  4.9  above.  Since  c^  ( q  i ,  02 )  “ 
(co.qi  ♦  02)  =  (4>.qiq2  ,  4nd  the  Fourier  transform  of  co  =  6  is  dwdA 
(Lebesgue  space-time  measure),  we  conclude  that  the  spectra!  measure  of 
X(q)  is  d4)(lv,A)  =  dwdA. 

Space  homogeneous/ time  stationary  analysis  yields  the  next  property. 

Property  4.12.  The  CS/TRF  Yq(s,  t )  can  be  zero  mean  space  homogeneous/ 
time  stationary  even  when  the  associated  OS/TRF  X(s,t)  is  space  nonho- 
mogeneous/time  nonstationary.  This  can  happen  under  certain  conditions 
concerning  the  choice  of  the  functions  q(s,t)  as  well  as  the  form  of  the 
functions  gp.c (s,  t ).  More  specifically,  we  must  define  spaces 

Qv  =  !q  €  Q  :  (q(s,t),gp,c(s.t))  =  0  for  all  p  $  v.c  §  q],  (4.35) 

and 

Cv.,1  =  lgp.c(s,t)  €  C  :  (q(s.t).gp.t(s.t))  =  0 

(qfs,t),Sh,Tgp,c(s,t))  =0forali  p  $  v,C  pj,  (4.36) 

where  C  is  the  space  of  continuous  functions  in  91"  x  T  with  compact  support. 
(4.35)  assures  a  zero  mean  value  for  the  CS/TRF  Yq(s,t)  at  (s,t)  (0,0), 
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while  the  closeness  of  Cv/m  1°  translation  (equation  (4.36))  is  necessary  in 
order  that  stochastic  inference  about  X(q)  makes  sense  (i.e.,  in  order  that 
the  stochastic  correlation  properties  of  X(q)  remain  unaffected  by  a  shift 
Sh.T  of  the  space/time  origin).  Functions  that  satisfy  these  conditions  are  of 
the  form 

0(j,c(s,t)  =  s-t‘-expfa  ■  s  +  (3tl,  (4.37) 

where  a  and  (3  are  (real  or  complex)  vector  and  number,  respectively. 

From  a  practical  point  of  view,  the  modelling  of  spatio-chronological 
variations  and  the  estimation  of  spatiolemporai  processes  is  easier  and  more 
efficiently  carried  out  when  the  q|,,,.(s.  t !  are  pure  polvnomuils,  viz. 

gp.c(s.t)  =  ...s;;- 1\ 

where  p  =  !p!  ,  p;.  This  is  due  mainly  to  convenient  invariance  and 

linearity  properties  that  the  latter  satisfy.  In  conclusion,  the  "derived"  fields 
X(q)  and  Y<|(s,t)  have  a  very  convenient  mathematical  structure.  From  a 
physical  viewpoint  this  means  that  even  if  X(  s,  t )  represents  an  actual  natural 
process  which  has,  in  general,  very  irregular,  space  nonhomogeneous/time 
nonstationary  features,  we  can  derive  fields  X(q)  and  V„(s.t)  which  have 
regular,  space  homogeneous/ time  stationary  features.  Hence,  analysis  and 
processing  are  much  easier 

5.  Spatiolemporai  random  fields  of  order-v  /  p 
(ordinary  and  generalized) 

5.1.  Random  fields  with  space  homogeneous/fime  stationary  increments 

We  now  come  to  what  is,  for  our  present  concern,  the  most  interesting  aspect 
of  S/TRF,  namely  the  concept  of  S/TRF  with  space  homogeneous/ time 
stationary  increments  of  orders  v  in  space  and  u  in  time,  in  the  ordinarv  or 
in  the  generalized  .sen.se. 

Definition  5.1.  A  CS/TRF  Y„(_s,t)  will  be  called  a  CS/TRF  of  order  v  in 
space/u  in  time  (CS/TRF-y/\i)  if  q  „  In  this  case  the  space  tjv  ,,  will 
be  termed  an  admissible  space  of  order  v/u  (AS-'v/q). 

Definition  5.2.  Let  be  an  AS-v/q.  A  CS/TRF Xiq]  with  space  homo¬ 
geneous  of  order  v/time  stationary  of  order  q  increments  (CS/TRF-v  n)  is 
a  linear  mapping 


X:Q,  „  L2(0.F,P). 


(5.1) 
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where  the  corresponding  CS/TRF  Yq(s,t)  is  a  zero  mean  space  homoge¬ 
neous/time  stationary  for  all  q  t  Qv/(i  and  all  (h.T)  t  tH"  x  T. 

The  set  of  all  continuous  GS/TRF-'v/n  will  be  denoted  by  Sv,  n-  The 
definition  above  is  equivalent  to  the  following  one,  which  we  unfold  in  an 
extended  Ito-Gel'fand  spirit: 

Definition  5.3.  A  GS/TRF-'v/q  X(q)  is  a  GS/TRF  for  which  all  differential 
operators  of  the  form 

Y(q)  =  D'^"'‘'"'X(q),  (5.2) 

where  =  x''' '  ’  "  >(q)  -  (-1 '  ^X(q' '' '  '  )  are  zero 

mean  space  homogeneous/time  stationary  generalized  fields. 

The  OS/TRF  associated  with  the  space  i.  will  be  defined  as  follows. 

Definition  5.4.  An  OS/TRF  X(s,t)  is  called  an  OS/TRF  of  order  v'q 
(OS/TRF-y //i)  if  for  all  q  €  Qv/n  the  conesponding  CS/TRF  Y„(s,t)  is 
a  zero  mean  space  homogeneous/ time  station.rry. 

In  light  of  Definition  5.4,  if  the 

Y(s,t)  --  D'^"'“"’X(s,t)  =  X''”'  "(s.t)  (5.3) 

exist  and  are  zero  mean  space  homogeneous/ time  stationary  fields,  then  the 
X(s,  t)  is  an  OS/TRF-v/n. 

In  connection  with  this,  the  following  propositions  can  be  pnn  en  [6]. 

Proposition  5.5.  The  solutions  of  the  stochastic  partial  differential  equation 
in  R'  •  T 

D  ”Xv  .,(s,t)  -  Y(s,t),  (5.4) 

where  Y(s,t)  is  a  zero  mean  space  homogeneous/ time  stationary,  are 
OS/TRF- V  'll.  Note  that  in  this  case  the 
Y(q)  ^  (q(s,  t),  Y(s,t)' 


-  (q(s,t|,x7;.'-‘'"'(s.t)) 

-  Y,',''"'“"'(0), 

is  a  space  homogeneous/time  stationary  generalized  field. 
Proposition  5.6.  The  OS/TRF 

V  H 

X(s,t)  =  Y.Y.  Pp.sSe.cl-S.t). 

P  0  c.  0 


(5.6) 
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where  3p  ,:  (p  ^  v  and  4  ^  p)  are  random  variables  in  Wk  =  L:>(D,F,  P),  is 
anOS/TRF-^'/p. 

In  view  of  (4.7}  and  (4.8),  to  each  generalized  X(q)  correspond  various 
ordinary  X(s,t),  all  having  the  same  CS/TRF  Yp(s,t)  =  X(S_s,-iq);  that  is, 
we  can  write 

X(q)  {X^Is.tl.Q  =  1,2,...; 

F  it  .  (5.7) 

q)  =  X(S-X-t  q(s.t) 

Hence, 

Definition  5.7.  The  set  Xp  =  ;X“(s,  t),  a  =  1, 2, . . . }  of  all  OS/TRF-v/p  which 
have  the  same  CS/TRF-v/p  Yq(s,t)  in  Q  will  be  termed  the  generalized 
representation  set  of  order  v/p  (CRS-v/n).  Each  member  of  the  GRS-v/p 
will  be  called  a  representation  of  the  X(q). 

We  can  now  state  the  proposition  below  [6]. 

Proposition  5.8.  Let  X^(s,t)  be  a  representation  €  of  X(q).  The  OS/TRF 
X«  (s,  t)  is  another  representation  if  and  only  if  it  can  be  expressed  as 

X‘'(s,t)  =  X'^(s,t) ^  X. ‘^e.cgp.cls.t),  (5.8) 

(j  leKvCvu 

where  the  Cp,^,  p  ^  v  and  C  $  p  are  random  variables  in  L  j(0,  F,  P)  s.t. 

Cp.c  -  {np,cffi.l),X*‘(s,t)).  (5.9) 

where  theqp,c(s,t)  satisfy  the 

(he,c(s,t),gp.,,.(s,t))  =  |’  ifp  =  p'andC  =  C'  ,3  jp, 

(.  0  otherwise. 

An  OS/TRF-v/p  is  not  always  differentiable.  It  can,  however,  be  ex¬ 
pressed  in  terms  of  a  differentiable  OS/TRF-v/p  as  shown  in  the  proposition 
below  (6). 

Proposition  5.9.  Let  X(s,t)be  a  continuous  OS/TRF-v/p.  Then  it  follows 

X(s,t)  =  X*(s,t)  +  Yq(s,t).  (5.11) 

where  X*(s,t)  is  an  infinitely  differentiable  OS/TRF-v/p  and  Yq(s,t)  is  a 
space  homogeneous/time  stationary  random  field. 
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5.2.  The  correlation  structure  of  spatiotemporal 
random  fields  of  order-'v/ p. 

In  this  subsection,  we  will  study  the  spatiotemporal  trend  and  correlation 
structure  of  a  S/TRF-'v/p.  In  view  of  the  preceding  results,  the  generalized 
field  (  —  has  constant  mean,  i.e.,  we  can  write 


E[Xl"'^>(q)]  =  m'^'^'(q)  =  =  aq(0,0). 


(5.12) 


while  the  covariance  functional  is  expressed  as 


..  /  _(■' ' .u+ ' )  ' .n  1 1 )  1  c rv( '  I \v( « 

Oxfdi  )  ~  E[X(qj  )X(q2 


(v  + 1  +  1 ) , 


(vi  i.m  1|, 


=  E[Xl''+'-^‘+"(qi)X'^+''^"*(q2)l 


=  CY(qi,q2)- 


(5.13) 


But  as  was  shown  above,  the  CY(qt.q2)  in  (5.13)  is  a  translation-invariant 
bilinear  functional  and,  therefore,  so  is  Cx(q*,''^''*' q^' ^ ”).  Taking 
into  account  the  properties  of  bilinear  functionals  (for  the  relevant  theory 
see  [14]),  (5.12)  and  (5.13)  lead  to  the  proposition  below. 

Proposition  5.10.  Let  X(q)  be  a  GS/TRF-v/nin  91''  x  T.  Its  mean  value  and 
covariance  functional  have  the  following  forms 

Tnx(q)  =  Y-  np,c{s^t‘-,q(s.t)} 

0$ |pl$  V  0$Csi  n 

=  X.X.  •••s]’,''t'-.q(s,t)),  (5.14) 

Pi  P2  Pn  C 


where  Op.c  are  suitable  coefficients,  0  $  p  =  |p|  =  ,  pi  $  v;  and 


Cx(qi,q2) 


JJ  qi  (iy,A)q2(w,A)d4>x(vy.A) 

!Hx3 


+G(q 


|v  M,p  H) 


(Q,0),q^ 


(xH.p+1) 


(0,0)). 


(5.15) 


where  91  =  Ol"  -  {0}  and  jf  =  T  —  [0),  <|)x  is  certain  positive  tempered  measure 
and  G  is  some  function  in  q)"' ' '  "  *  ”(0,0)  and  q^^  *^”(0,0). 

We  proceed  with  the  analysis  of  the  spatiotemporal  correlation  struc¬ 
ture  of  C)S/TRF--v/p  by  introducing  the  following  definition. 
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Defiiiition  5.11.  Consider  a  continuous  OS/TRF-'v/ti  X(s,t).  A  continuous 
and  symmetric  function  kx  (h,  t)  in  iH"  x  Tis  termed  a  generalized  spatiotem- 
poral  covariance  of  orders  in  space  and  nin  time(GS/TC-^//yi)  if  and  only  if 

(X(qi),X(q2))  =  (kx(h,T),qi(s,t)q2(s',t'))  ^  0,  (5.16) 

where  h  =  s  —  s'  and  t  =  t  —  t',  for  all  qi,  q2  G  Qv/u- 

In  other  words,  in  order  that  a  given  function  be  a  permissible  model 
of  some  GS/TC-v/n  it  is  necessary  and  sufficient  that  the  condition  (5.16) 
is  satisfied.  We  saw  above  that  with  a  particular  GS/TRF-v/|i  X(q)  we  can 
associate  a  GRS-"v/q  Xq  whose  elements  are  the  corresponding  OS/TRF-  y/yi 
X(s,  t).  Similarly,  with  a  particular  X(q)  wecan  associate  a  set  of  GS/TC-v/p 
satisfying  Definition  5.1 1 ;  this  set  will  be  called  the  generalized  spatiotenipo- 
ral  covariance  representation  set  of  orders /yi  (GS/TCRS-y/yi),  and  will  be 
denoted  by  The  concept  of  theGS/TC-v/p  kx(h,  x)  can  be  considered 
as  the  space-time  extension  of  the  purely  spatial  generalized  covariance  in 
the  sense  of  [22]. 

We  will  see  below  that  .some  iideresting  properties  of  may  be 
obtained  by  assuming  that  the  GS/TC-'v/p  is  space  isotropic,  that  is 

kx(h,T)  =  kx(r,T),  (5.17) 

where  r  =  |h|. 

Let  us  now  explore  (5.13)  some  more  in  light  of  Definition  5.11.  We 

have 


,  I  V  t  1 .11  1  1  I  (  V  M  .M  M  1 
CxlQi  .Qi 


(Cv(s  -s',t  -  t'),qi(s.t)q2ls'.t')); 


it  also  is  true  that 


D 


I  2v  1  2,2m  [  2  t 


Cx(s,t;s',t')  =  cv(s  -  s',t  -  t'). 


(5.18) 


The  above  partial  differential  equation  can  be  solved  with  respect  to 
Cx(s,t;s',t').  For  illustration  consider  first  the  IH’  x  T  case:  according  to 
Proposition  5.5  above,  if  X(s,t)  is  a  differentiable  OS/TRF-v/p  in  IH'  >  T 
such  that  D*''  ’X(s,  t)  =  Y(,s,  t ),  the  Y(s.t)  is  space  homogeneous/ time 

stationary.  The  corresponding  covariances  of  X(s,t)  and  Y(s,t)  are  related 
by  D'^'' '  '  ^'cx  (s,  t;  s',  t')  =  cy  (r.x),  where  r  =  s  —  s'  and  x  =  x  -  t'.  The 

solution  of  this  partial  differential  equation  is 

Cx(.s,t;s',t')  =  kx(r,x)  +  Pv.q(s.t;s'.t'),  (5.19) 


where 


kx(r,x)  =(-!)''"  ^‘' 


OJ 


^  (r  -  u) 


2vl 


V)^‘‘ 


I  I 


'O 


(2v  +  1)!(2p  +  1)! 


cy(u,v)dudv  (5.20) 
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is  the  corresponding  GS/TC-'v/vi  and  Pv.,i(s.t;s',t')  is  a  polynomial  of  de¬ 
gree  -v  in  s,  s'  and  n  in  t,  t'.  (5.20)  can  be  solved  with  respect  to  Cy  (t,  t),  viz. 

Cy(t,t)  =  (5.21) 

In  fR"  X  T  the  analysis  above  leads  to  the  following  proposition  [6]. 

Proposition  5.12.  Let  X(s,t)  be  an  OS/TRF-'v/n  in  91''  x  T.  Its  covariance 
function  can  be  expressed  in  the  following  form 

Cx(s,t;s',t')  =  k.x(h,T)  -h  Pv.n(s,t;s',t'),  (5.22) 

where  k.x(h,  t)  (h  =  s  —  s'  and  t  =  t  -  t')  is  the  associated  CS/TC-v/u  and 
Pv.n(s.  t'.  t')  is  a  polynomial  with  variable  coefficients  of  degree  a'  in  s^  s', 

and  degree  u  in  t ,  t'. 

Proposition  5.12  together  with  the  definition  of  GS/TRF-v/u  conclude 
the  following  result. 

Corollary  to  5.12.  If  X(q)  is  a  GS/TRF--v/u  in  91"  <  T,then 
Cx(qi.q2)  =  {Cx(s,t;s',t'),qi(s.t)q2(s',t')) 


=  (kx(h.Tl,qi(s,t)q2(s'.t')). 


15.2.1) 


In  view  of  theCorollary  to  5. 12,  condition  (5.16),satisfied  by  all  GS/TC- 
y/p.  kx(h,  t),  can  also  emerge  from  the  fact  that  Cx(qi ,  qz )  is  a  nonnegative- 
definite  bilinear  in  functional  Qv/u  which  satisfies  (5.23).  A  continuous  and 
symmetric  function  kx(h,T)  in  91"  ■  T  is  a  permissible  GS/TC-v,'n  if  and 
only  if 

(kx(h,T),q(s,t)q(s',t'))  JO,  (5.24) 

for  all  q  €  Qv  n-  We  will  also  say  that  the  kx(h.T)  is  a  conditionally 
nonnegative-definite  function  of  order  y/p. 

Let  X(s,t)  be  a  differentiable  OS/TRF-v/|i.  By  definition  the 

Y,(s.t)  =  t,  (5.25) 

IS  a  zero  mean  space  homogeneous/time  stationary  random  field  for  all 
a  fE  A,  with 

n 

A  =  (a  =  (y*,  p  1 )  =  (-vi  ,V2 . Vn,  n  -f-  1 ) :  ^  Vj  =  v  -I-  1 

i  1 

The  spectral  representation  of  the  covariance  of  each  Y„(s,t)  writes 
CY„(h,T)  =  JJexpfi(w  •  h -f  At)1  d4)v„(vy,A),  where  ())y„(w,A),  q  €  A 
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are  positive  summable  measures  in  tR"  x  T,  without  atom  at  the  origin.  We 
define  the  covariance 

CY(h,T)  =  Y_  Cy Jh,T) 
a€  A 


=  E 


.a€  A 


.HM) 


X(s',t') 


a6A 


w  ■  h  4-  Xt)1  d(j)Y(vv,X), 


(5.26) 


where  <))y(w,X)  =  <})Ya  (w,X)  is,  also,  a  positive  summable  measure  in 

(R"  X  T,  without  atom  at  the  origin. 

A  function  kx(h,T)  is  a  permissible  GS/TC-v/vi  in  the  sense  of  Defini¬ 
tion  5.1 1  if  and  only  if  it  admits  the  following  spectral  representation 

Ih  :_  [[  [e^^plMvv  •  h)l  -  Piv  1 1  [i(vy  •  h)l]  [expHlXx)  -  « i  (iXxllJ 

'  JJ  ^,2v<2;^2M»i 

d4)Y(w.X) +P2v,2,.(h,T),  (5.27) 

where 


2'v  f  1 


P2v.  lft(w-  h)l  = 


p  0 


■  p  (vy  •  h)" 
p! 


and 


...  , 

r2n* i(iAt)  =  2_  ^ 

and  r2v.2p(h,T)  is  an  arbitrary  polynomial  of  degree  $  2v  in  h  and  ^  2i.i 
in  T. 

On  the  basis,  now,  of  the  obvious  inequality 
|exp[t(w  ■  h)l  -  P2v  ( i(i(vv  •  h)l|  lexpfiXTl  -  P2n,  i(iXT)| 

^  (vv- h)^'"^(XT)^>‘'^ 
^  (2v  f  2)!(2n  -f  2)!  ’ 

and,  since  a  OS/TRF-v/n  is  also  a  OS/TRF  -v'/f'  for  all  v'  >  v  and  p'  >  p, 
it  follows  that 


Spatioh’iiii’onil  nindoni  fidd^  } 


{  -’H 


which  assures  the  existence  of  the  integral  (5.27).  In  view  of  the  foregoing 
considerations,  if  k^  (h,  t)  6  then  k^fh.T)  Piv.2,i(h,  t)  t  loo. 
Clearly,  the  GS/TC-'v/p  satisfies  the  relation 


^  2  V  t  2 
'  )\ 


k,(h,Tt  =  (-l)''*'‘cv(h,T). 


(5.29) 


In  relation  to  (5.29),  the  measure  4)y(w,A)  is  the  Fourier  transform  of 


(-1 


\v  1  f  2 _ 

'  h  1 2 


kx(h,T). 


Employing  Proposition  5.9  it  is  not  difficult  to  show  that  the  representation 
(5.29)  is  in  general  true  for  any  X(s,t)not  necessarily  differentiable. 

In  the  case  now  where  4iv(u’.'^)  'S  differentiable,  we  can  define  the 
generalized  spectral  density  function  of  order  v/u  K^tw.A)  as  the  n-told 
space/time  Fourier  transform  of  k,(h,T).  The  lemma  below  is  a  immediate 
consequence  of  the  preceding  spectral  analysis. 


Lemma  5.13.  Let  X(s,  t)  be  a  differentiable  OS/TRF'-v  'u.  A  continuous  func¬ 
tion  kx(h,T)  in  tH"  <  T  is  a  permissible  GS/TC--v/ii  if  and  only  if  (5.28)  holds 
true  and  the  corresponding  Kx  (w,  A)  exists  (in  the  sense  of  generalized  func¬ 
tions),  includes  no  atom  at  origin  and  is  .such  that  the  w  ’  -  A-^'* '  -KJ  w,  A) 
is  a  nonnegative  measure. 

.Note  that  if  the  space  isotropic  c\  (r.T),  r  jH,  is  space-time  separabl", 
i.e.,  cv(r,T)  cv  (r)cv  (T),  then  the  kx(r. t)  is  separable  too,  i.e..  kxfr.i) 
kxlrlkxlr).  We  shall  examine  a  series  of  cases  of  this  type  below 


Example  5.14.  Consider  the  stochastic  partial  differential  equation  (5.4), 
where  Y  ( s,  t )  is  a  zero  mean  white  noise  S/TRF  in  R’  •  1  with  covariance 


Cv  (r,T)  =  6(r,T]  6(r)6lT), 


(5.,)0) 


(5.20)  gives 

kx(r,T)  -  (  1)'''“ 


(2v  a  l)!(2u  -I-  1)!' 


(5..^1) 


A  generalization  of  the  covariance  (5.31)  in  fH"  ■  T  is  the  isotropic 
CS/TC-v/m 


kx(CT)  =  ^^(-  l)^'"apcr^''"T^"",  (5.32) 

p  0  c  a 

where  the  coefficients  Op^  should  satisfy  certain  permissibility  conditions  so 
that  the  kx(r,T)  is  a  conditionally  nonnegative-definile  function  in  the  .sense 
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of  (5.16);  see  also  Lemma  5.13.  More  precisely  the  coefficients  must  be 
such  that  the  following  condition  is  satisfied: 

"  ^  G((2p  +  n+ l)/2)[(2p  f  1):lf(2c-.+ D:: 

2_L - p. -  - ‘ 

p  Oi.  0 

i  0,  (5.3.5) 

where  G| .)  is  the  gamma  function,  for  all  (c  >  0  and  X  >  0. 

Based  now  on  the  observation  that  an  OS/TRF-v,  u  of  the  form  15.1) 
can  be  assigned  a  GS/TC-v  p  of  the  polynomial  form  (5.3 1 1,  the  proof  of  the 
following  proposition  is  straightforward. 

Proposition  5.15.  Assume  an  OS/TRF-v/p  in  fH'  1  can  be  expressed  b\' 


X(s,t)  -  ^ 

p  e  0 


ul'Mt  v)‘- 
pli'.l 


Y(u,  v|  du  dw 


(5.. 54  I 


where  p  0,  I,  —  v  and  d  0,  1,  . . .,  p  are  suitable  coefficients  and 
Y(.<:,  1 1  is  a  ^ero  mean  white  noise  S/TRF'  in  fB'  ■  I .  Then  its  GS/TC-x  li  is 
of  the  form  (5.32). 


6.  Stochastic  partial  differential  equations 


Stochastic  differential  equations  over  space-time  have  the  general  fi'rm 

I  Xl.s.t)'  Y(.s,t),  (h.ll 


where  X(.s.  t )  is  the  unknown  S/TRF,  1  '  '  is  a  given  operator,  and  Y(  s.  t '  is  a 
known  S/TRF' — also  called  a  forcing  function.  Despite  significant  progress 
over  the  last  decade  or  so,  much  work  remains  to  be  done  in  the  theorx  ol 
stochastic  partial  differential  equations  (SPDF).  A  partial  list  of  references  is 
given  in  12'4|. 

We  saw  above  that,  bv  definition,  a  continuous-parameter  (.)S,  TRl  - 
V/ p  obeys  certain  SPDF  and  the  corresponding  covariances  (ordinarv  and 
generalized)  satisfy  the  corresponding  deterministic  differential  equations: 
If  X(.s,  t )  is  an  OS/TRf'-v  p,  by  definition,  all 


Yd.s.t) 


^  V  .  II  •  *’ 

a.s) '  ’at" ' 


Xl.s.t) 


le.2l 


are  space  homogeneous/time  stationary,  S/TRF.  Let  v  .  2tn  1;  the  field 


Y(s,t) 


^Yi(s,tl 

i  1 


^  V  »  I 


0...  1 

ai>‘ ' ' 


X(s.t,, 


(6.3) 
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where  Lf  l  =  V'"  *  ’  is  space  homogeneous/time  stationary,  too.  This 

n’oservation  leads  to  the  following  result  [7], 


Proposition  6.1,  Let  Y(s,  t)  be  a  space  homogeneous/ time  stationary  field. 
Then,  there  is  one  and  only  one  OS/TRL'-v/^  X(s,t)  with  representations 
satisfying  the  differential  equation 

(6.4) 

where  v  =  2m  -  L 

An  immediate  consequence  of  Proposition  6. 1  is  the  corollary  belov\-. 

Corollary  to  6.1.  If  X(s,  t)  is  an  OS/TRF-y  4c  then  there  exists  an  OS/TRF- 
(v  2k);'(n  +  2A)  Z(s,t)such  that 


a-' 

Xl.s.t)  -  Zl.s.t). 

at*-' 


(6.. a) 


and 


The  covariances  associal^’d  with  (6.2)  and  (6.3)  are,  respectively 
a-' '  ‘ 

C> ,  ■  , —  , . -  c\|.s,  t;.s',  t'l,  (6.(-' 

a. s'  a.s' '  ‘  at “ ' ' at'“ ' 


Olh.Tl  -  (.-Ip ‘-AvIh.Ti  (6.71 

aT‘‘"- 


7.  Discrete  linear  representations  of  spatiotemporal  random  fields 

The  key  element  in  passing  from  abstract  theory  to  a  practical  analysis  ol 
spatiotemporal  data  is  the  development  of  suitable  discrete  linear  representa¬ 
tions  of  the  S/TRl-  model.  This  is  necessary  becau.se  real  data  are  usually 
discretely  distributed  in  space-time 

Let  X(,s,,  t, ),  where  l.s,,  t| !  •  '?V'  ■  l,i  1,2 . in  and  i  1,2 . 

k,  be  a  discrete-parameter  OS/TRF.  Let  q  •  Q  ;  Q,„,  where  D,,,  is  the  space 
of  real  measures  on  fH’'  •  I  with  finite  support  and  such  that 

Til  1’, 

qls.t)  Y-  d(s,.i,l^(s,  -  s.tt  tl 

I  I  I  I , 

n\  p  . 

Y^  ^  diji>ii(s,  t),  (7. 11 

>  1  i  It 

where  pj  denotes  the  number  of  time  instances  t,  (j  h,  2i, _ Pi)  used, 

given  that  we  are  at  the  spatial  position  s,. 
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The  corresponding  discrete  GS/TRF  and  CS/TRF  are,  respectively 

/  <n  p, 

X(ql  ^  ^  21  qn^ii(s.t)X(s,t) 


\  i  1  i  F 

m  Pv 


Y_  Y_  q.iX(Si,t,), 


".2) 


i  I  i  I., 


and 


/  I’. 

Vi|(s.H  "  {Y-H.  fl>)^ij(s'.t'l.Ss,tX(s',t') 

\  I  ?  I  I . 


Y_  Y.  qi.s..,xis,,t,i. 


I  M  I 


Definition 7.1.  ThodiscreteS/TRFY„(s.t)of  (7.3)  iscalled  a  sp^Uiolemponil 
incrt'inoiu  ol  order  \  in  sp.uvaiid  u  in  time  (S''TI-v  m)  on  Q,,  ,,  it 

'u  r 

Y  Y 

1  i  I  I . 

tor  all  ('  •-  V  and  i,  •  u.  In  this  case  the  coeftidonts  q,j  ■  pv  u 

i  1,2 . tn  and  >  1 2, . p,  will  be  termed  .nimissihle  eoelTic'ents 

oi  order  \  u  u). 

Definition  7.2.  The  discrete  OS/TKl-  Xl.s.ti  will  be  called  a  OS/TRF-  v  n 
on  Qv  n  it  the  corresponding  S/Tl-v  n  Y,|ls,  1 1  is  a  zero  mean  space  homo¬ 
geneous/time  stationary  Rh 

A  summary  of  continuous  S/TRl'-related  notions  and  their  discrete 
analogues  are  given  in  Figure  7.  i . 

Example  7.3.  Consider  thecase  illustrated  in  Figure 7.2,  where  (s,  1 1  •  '.tP  ■  1 
and  s  l.si ,  sj ).  Let 

s  ! 

Y.ilsi ,  sj,  1 1  Y  Y  qijXl.s.i  ■  Sij.  ij  1 

.111 

X(si  .  As,s,!,t  t  At)  2X(si  *  As.si.t)  t  X(si  *  As,.s,;,t  At) 

•  X(.s  1  ♦  A.s.  .s„i,  t  4  At )  2X(.si ,  S2  (  As,  t )  4  X(st ,  4  A.s,  t  -  At  I 

fXIsi  A.s,  s.!,  t  t  At )  2X(s)  -  A.s,  s,.,  t)  I  X(st  -  As, -Si,  t  At) 

t  X(,si ,  sj  -  As,  t  +  At )  '  2X(si , S2  -  As,  t )  4  X(,S| ,  S2  -  As,  t  At ) 

-  4[X|si ,  S2,  t  4  At )  2X(si  ,S2,  t)  +  X(si ,  $2,  t  -  At )].  (7..'i) 
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<l(S.OeQ„„  t.  q(5..<,)6(5, 

i«l  id, 


THEORY 

PRACTICE 

X(q)=<q(s.l).X(5.l)) 

X(q)  =  i  i,  q,x{5i.<,) 
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Figure  7.1:  S/TRF-related  notions  and  their  discrete  analogues. 

It  is  easily  shown  that 
y~  y~  dii.i^jsfi'  sfi  -  0 

i  I  i  I 

forallp)+p2  ^  1  and  C  5;  I.  Therefore,  the  Y<|(si,  sj.t)  above  is  a  S/TI-1 /I. 
Proposition  7.4.  Let  X(s,  t)  be  an  OS/TRF  on  Qv  „  and  let 

m  p  I 

Xlso.to)  =  ^ 

i  n  1, 

be  the  linear  estimator  of  X(s,  t)  at  point/instant  Sj,.  to)  such  that 

FfX(So,to)l  -X(So,to)l  =0  (7.7) 

and 


E[X(so.to)l 


^  hpC^oto. 


(7.8) 
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whiTC  r|p^  are  suitable  coefficients.  Then  the  difference 


X(S(i,te)  ^X(Sj.,.to)  -  y~  y~  X i j X ( s , .  1 1 ) 


’.9) 


where  Aoe  1  ^nd  Aio  =  Aoi  =  0  (I,  i  /  0),  is  an  S/Tl-'v/n  on  Qv 


Proof.  See  16).  I 

In  the  discrete  framework,  a  function  k,(h,T)  in  tH"  •  T  \s  a  generalized 
spatiotemporal  covariance  of  ordery  in  space  and  ^  in  time  (GS/TC-y/\.i)  if 
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and  only  if  for  all  AC-’v'/|i  {qij} 
EfX(q)]"  =E[Yq(0,0)l" 


=  E 


Y_  Y.  quX(Si,tj) 

i  1  )  li 


m  Pi  m  Pt' 


i-l  i  ll  i'  1  j'  -1;, 


^  0, 


where  hj;.  =  S;  -  and  Xjj.  =  tj  -  tj-. 

8.  Optimal  estimation  of  spatiotemporal  random  fields 


(7.10) 


8.1.  General  considerations 

In  this  section,  we  will  deal  with  the  spatiotemporal  estimation  problem, 
which  has  various  applications  in  almost  any  scientific  discipline. 

In  general,  the  spatiotemporal  estimation  problem  can  be  summarized 
as  follows; 

Problem  8.1.  Let  X(q)  be  a  CRS-v/u,  and  let  „  be  the  Hilbert  space 
generated  by  the  representations  X(s,t)  of  X(q)  (the  X(s,t)  may  represent, 
for  instance,  the  precipitation,  the  atmospheric  pollution  or  a  meteorologic 
element  at  position  s  at  time  t).  Let  X(s^,t„)  €  (Hv  m-  want  to  find 
estimates  X(f^,t„)  of  the  actual  values  X(s^^,t„)  of  the  natural  process  of 
interest  at  unknown  positions  Sj,  and  time  instances  t,,.  The  calculations  are 
to  be  made  on  the  basis  of  experimental  data  (observations)  XlSj,  tj ),  i  =  1,2, 
. . . ,  m  and  j  ~  Ij,  2,, . . .,  Pj.  More  precisely,  an  estimate  X(s,^,  t,, )  is  defined 
as  an  element  of  (H'v  „  which  fulfills  the  following  requirements: 

1 )  Lincarit}/,  viz. 

XISK.tcj)  (8.1) 

where  =  [Eijl  (i  =  1,  2,  m;  j  =  Ij,  Pi)  is  a  vector  of  real 
coefficients  Eii  to  be  calculated  during  the  estimation  process,  and  = 
fXfSj,  tj )1  is  a  vector  of  known  elements  XfSj,  tj)  e  Wv,',,,  (Sj,  tj )  e  A, 
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Figure  8.1:  The  91'  x  T  case  of  linear  space-time  estimation. 


where  A  is  a  compact  set  of  data  points/time  instances.  (Figure  8.1 
illustrates  the  91'  x  T  case  of  such  a  linear  estimator.) 

2)  Unbiasedness,  i.e., 

EfZ(s;,,t<,)l  =0,,  (8.2) 

where  Z(sv.,tc)  =  Xls^.t,,)  -  X(S|,,tq). 

3)  Optimality  (minimum  mean  square  error),  i.e.,  it  must  minimize  the 
estimation  error 


ffxUk.tq)  =  E[Z(s,,,tq)H. 


(8.3) 
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K  =  fk 

This  is  a  constrained  optimization  problem  in  SH"  x  T  whose  solution 
depends  upon  the  regularit)’  properties  of  the  random  field  X(s,t)  over 
space-time. 


8.2.  Optimal  estimation  of  space  nonhomogeneous/time 
nonstationary  processes 

Suppose  now  that  the  natural  process  can  be  represented  by  a  S/TRF--v/u 
X(s,  t).  On  the  basis  of  Proposition  7.4,  the  Z(s^.,  t,,)  =  X(s,^,  t^, )  -  X(s,^,  tp ) 
is  a  S/TI-v/n,  and  its  variance  is  given  by 

m  Pi  m  Pv 

~ ^  y~  y~ 

i  1  )  1i  i'  1)''  Ip 
tn  P  i 

-2^  ^  Ziik,(hu.T.,i)  +kP0,0).  (8.4) 

i  1  i  K 


(£.k<i  =  -1  and  Zui  =  Zki  =  0  (i  k,  j  q)).  The  fact  that  Z(Sv,,t<|)  is  a 
S/Tl-v/'u  implies  that 


i  I  i 


(8.5) 


for  all  0  $  Ipl  ^  -v  and  0  c  ^  q.  (Note  that  (8.5)  is,  also,  the  unbiasedness 
condition  (8.2|).  The  minimization  of  (8.4)  with  respect  to  Zn's  subject  to 
the  constraint  (8.5)  yields  the  system  of  equations 


(8.6) 

_  x(Si.ti:Si,,ti. ),  sft^  i,i'  =  1,  2,  ...,  m;  i  =  h,  ....  pn  i'  =  li-.  •••, 
Pi-;  Ip!  ^  y,  C  ^  m1  is  a  matrix  of  GS/TC-y/m  and  space-time  polynomial.s; 

I*'  =  [Zij.riipc-  i  =  1,  2,  ...,  m;  i  =  li,  ...,  Pi;  p  =  Ipi  ^  v;  c  5  nl  is  a 
vector  of  coefficients  Zj,  which  includes  the  Lagrange  multipliers  and 

the  vector  =  [kxls^.tq-.Sj.tj),  sftjj;  1  =  1,2 _ _  m;  j  =  L, . . .,  Pi,  Ipl  ^  v, 

C  $  pi. 


9.  Simulation  in  space-time 

Several  of  the  spatial  simulation  approaches  can  be  extended  in  order  to 
produce  realizations  of  spatiotemporal  random  fields  (S/TRF). 

By  means  of  the  ST  simulation  concept  (5, 81  the  space  n-dimensional  x 
time  random  field  Xnls.t),  where  (s,t)  e  tH"  x  T,  can  be  simulated  by 
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summing  contributions  from  several  random  processes  Xi  e  (si,t),  where 
s  =  s  •  0;,  (Si,t)  e  SH'  xT;viz. 


1 

Xn(s,t)  =  -^  TXi.e  (Si.t). 


(9.1) 


in  which  N  is  the  number  of  simulation  lines. 

On-line  realizations  of  the  S/TRF  Xi  (si,  t)  can  be  generated  in  terms 
of  its  spectral  density  function 


C,,9(cO,A)  =<fCn{w.X)l, 


(9.2) 


where  w  =  a>0,  by  using  the  simulation  formula 

in  K 

X,(s,t)  =  ^^y2C,.e(a)i.AK)AcUiAAu 


i  I  k  I 


cos(a'iS  -  27TAi,t -I- (l)j,k).  (9.3) 

where  the  phase  angles  (fjj.v.  are  distributed  randomly  but  uniformly  within 

[0,2Ttl. 

Spatiotemporal  simulation  is  a  valuable  tool  in  the  context  of  random 
moving  surfaces  studies,  such  as  sea  waves  and  their  action  on  structures, 
atmospheric  pollutants  and  meteorological  elements.  Also,  the  simulation 
method  may  be  used  to  develop  a  spatiotemporal  model  for  rainfall  genera¬ 
tion.  Space-time  rainfall  simulations  can  be  used  in  evaluating  strategies  for 
satellite  remote  sensing  of  rainfall  and  for  studying  storm  runoff  problems. 
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S  We  discuss  the  reconstruction  of  bandiimited  functions  for  randomly  sam¬ 
pled  valuesand  give  an  algorithm  that  works  provided  the  sampling  density 
is  above  the  Nyquist  rate. 

Let  f  be  of  finite  energy  and  bandiimited  with  bandwidth  2iu,  i.e., 
f  €  L‘(R),  suppf  C  [-co.o)],  and  let  . . .  <  xt  .1  <  xi  <  xi,  1  <  ...  be  a 
random  sampling  seL  such  that  its  density  6  ;=  supjxi+  t  xi )  <  n/w, 
i.e.,  arbitrarily  close  to  the  Nyquist  rate.  If  fn  is  the  result  of  the  algorithm 
after  n  iterations,  then  the  rate  of  convergence  of  f  n  to  the  original  function 
f  is  ||f  -  fn|U  ^  (6ai/7t)''^’(rt 6a»)(Tt  -  6uj)" '||f||2.  This  allows  for 
good  estimates  of  the  number  of  iterations  required  to  achieve  a  certain 
reconstruction  accuracy. 

In  contrast  to  recent  reconstruction  methods:  (1)  an  explicit  and  opti¬ 
mal  estimate  for  the  sampling  density  required  for  the  convergence  of  the 
algorithm  is  derived,  and  (2)  the  algorithm  functions  independently  of  the 
sampling  geometry — as  long  as  the  sampling  density  is  higher  than  the 
Nyquist  rate. 


1.  Introduction 

In  many  applications  the  problem  arises  of  whether  a  bandiimited  function  f 
is  uniquely  determined  by  its  nonuniformly  sampled  values  f  (xn )  and  how 
it  can  be  reconstructed  from  these  samples. 

In  this  article  we  discuss  and  compare  various  quantitative  results  on 
nonuniform  sampling.  At  the  1989  ASI  we  outlined  a  new  approach  to 
nonuniform  sampling  which  contains  a  new  generation  of  iterative  recon¬ 
struction  algorithms  [10].  In  theory,  these  algorithms  have  all  properties 
that  are  required  for  a  good  reconstruction  algorithm;  they  are  stable,  con¬ 
verge  for  a  large  class  of  norms,  possess  good  localization  properties  and 

f  The  author  acknowledges  partial  support  by  grant  AFOSR-90-031 1 . 
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work  in  any  dimension  [7,  6,  5).  However,  for  the  algorithm  to  work,  the 
sampling  set  was  required  to  be  "sufficiently  dense."  Thus  it  was  not  at 
all  clear  whether  the  algorithms  would  converge  for  realistic  sampling  sets 
close  to  the  Nyquist  density,  and  it  was  unknown  how  they  would  perform 
in  practice. 

Since  then  the  numerical  implementation  and  comparison  with  other 
methods  has  produced  very  convincing  results  in  favor  of  this  new  class  of 
reconstruction  algorithms  [2, 9]. 

The  objective  of  this  article  is  to  provide  some  sharp  estimates  which 
explain  the  success  of  the  new  method.  In  Section  4  it  is  shown  that  for  the 
version  of  the  algorithm  that  has  been  implemented,  the  required  sampling 
density  is  arbitrarily  close  to  the  Nyquist  rate.  Explicit  estimates  are  given 
for  the  rate  of  convergence  of  the  iterative  algorithm.  It  is  then  compared 
with  other  methods  that  have  been  proposed  for  the  complete  reconstruction 
of  bandlimited  functions. 

Let  L^liH)  denote  the  Hilbert  space  of  square-integrable  functions  on 
with  norm  ||f|l  =  (JT^c  dx)'  •^.  For  to  >  0, 

Bi  =!f  €  L^(91)  ;suppf  C  f-tu.tul] 

denotes  the  closed  subspace  of  square-integrable  bandlimited  func¬ 
tions  with  bandwidth  co.  Here  the  Fourier  transform  is  defined  by 
f(£,)  =  f(x)c“"^  dx.  xa(x)  is  the  characteristic  function  of  a  set  A. 

is  a  Hilbert  space  with  reproducing  kernel  Ly  sine  a>(x)  =  sin  a’(x- 
y  )/(.L’(x  -  y ),  where  sincx  =  sin  x/x  and  Li,f(x)  =  f{x  -  y )  denotes  the  shift 
operator.  In  other  words. 


f(y)  = 


f(x)sinc  a>(x  -  y)  dx. 

'.K 


has  the  orthonormal  basis  lsinciu(x  -  7in/a'),n  € 
both  facts  yields  the  cardinal  series  for  f Bf„ 


(1.1) 

2.1.  A  combination  of 


f(x)  =  ^  f(7Tn/cu)sinc a>(x  -  Tin/ci'l 
net 

The  sampling  rate  (n/cu)~'  is  the  so-called  Nyquist  density.  It  is 
the  lowest  density  at  which  complete  reconstruction  is  possible  in  a  stable 
way  [16]. 

Since  Ly  sine  lux  is  the  reproducing  kernel,  the  reconstruction  of  f  from 
f (Xn )  =  J  f (x)  sine  cu(x  -  Xn )  dx  is  a  version  of  the  moment  problem.  State¬ 
ments  about  sampling  are  therefore  equivalent  to  statement  about  spanning 
properties  of  the  sequence  L*.,  sine  cox.  Thus  conditions  when  a  sequence 
Lx„  sine  cox,  TV  €  2.,  or  equivalently  =  (Lx„  sine  coxTlL),  con¬ 

stitutes  (Da  Riesz  basis,  or  (2)  a  frame,  or  (3)  a  weighted  frame  for  B^, 
(L^([  -co.col)  respectively)  lead  naturally  to  sampling  theorems. 
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Sections  2  through  4  explain  the  reconstruction  methods  related  to 
these  possibilities  and  discuss  their  advantages  and  disadvantages  in  the 
numerical  treatment  of  nonuniform  sampling.  Section  5  contains  the  proof  of 
Theorem  4. 1 ,  which  is  the  most  recent  and  currently  most  efficient  algorithm. 

2.  Kadec’s  1. '4-Theorem 

The  first  sampling  theorem  is  implicit  in  the  work  of  Paley  and  Wiener  [20] 
on  nonharmonic  Fourier  series  and  deals  with  the  question  when  a  sequence 
a  l^asic  for  L^(-to,tu).  The  sharp  constant  V4  >s  due  to  Kadec 
[15].  For  statement  in  the  engineering  literature  see  [1, 13, 19, 23]. 

Recall  that  a  sequence  eT,,n  €  2.  is  called  a  Riesz  basis  of  a  Hilbert 
space  Of,  if  it  is  the  image  of  an  orthonormal  basis  of  Of  under  a  bounded, 
invertible,  linear  operator. 

Theorem  2.1.  If  Xn ,  n  €  N  is  a  nonuniform  sampling  set  for  which 
I  Tin  I  71 

Xn - Ul<— .  n€2  12.1) 

1  cu  I  4  a' 

then  there  exists  a  sequence  g„  in  such  that  for  every  f  ^ 

f(x)  =  ^  f(Xn  )gn(x)  12.2) 

where  the  series  converges  in  the  L'^-norm.  The  collection  sinca'Ix-  x,,  1.  n  c 
Z  is  an  Riesz  basis  for  with  biorthogonal  system  g„. 


2.1.  Advantages 

«  The  required  sampling  rate  is  exactly  the  Nyquist  rate. 

«  Since  the  proof  of  this  theorem  is  based  on  the  inversion  of  a  linear  oper¬ 
ator  by  a  Neumann  series  [24],  the  reconstruction  of  t  from  the  samples 
f(x„ )  could  be  formulated  as  an  iterative  algorithm,  see  Section  3  and 
Section  4. 

•  The  functions  lg„',  of  the  biorthogonal  basis  are  known  explicitly 
in  terms  of  Lagrange  interpolation  functions  [17].  Let  q(x)  = 
(x-xo)  ,(l-x/Xn  1(1 -x/x-n),  then g„(x)  =  g(x)/((x  -x„  Ig'lx,,)). 

■  Both  collections  sinca'Ix  -  x„j,n  6  2. and  g„,n.6  2  are  linearly  inde¬ 
pendent.  This  fact  has  two  important  consequences: 

1)  The  coefficients  Ci  in  the  expansion  f  =  c„gn  are  uniquely 
determined,  namely  Cn  =  fix,,). 

2)  Interpolation:  For  every  sequence  A„  c  there  is  a  f  e  Bfj,,  such 
that  f(Xn)  =  An,  specifically  f  = 
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2.2.  Disadvantages 

•  The  sampling  sets  x„ ,  n  €  2,  are  restricted  to  jittered  versions  of  regular 
sampling. 

•  Although  "explicitly"  known,  the  gn's  are  too  complicated  for  use  in 
numerical  work. 

•  The  sine  uj(x  —  x,i),n  €  Z  are  linearly  independent.  This  implies  that 
a  function  cannot  be  reconstructed  completely  if  even  only  one  sample 
is  missed. 

■  Instability:  If  only  one  sampling  location  is  changed,  then  all  gn's 
change  drastically  [21]. 

The  applicability  of  Kadec's  theorem  in  numerical  work  seems  to  be  limited. 

3.  Ouffin-Schaeffer’s  theorem  on  frames 

The  requirement  that  translates  of  sine  form  a  basis  for  is  too  strong  to 
allow  random  sampling.  A  more  realistic  approach  to  sampling  demands: 

■  the  signal  is  uniquely  determined  by  the  samples,  in  other  words, 
L*„  sine  cox  spans  B^,  and 

•  the  sampling  is  stable. 

Both  requirements  can  be  expressed  by 

IlflKcf^lflxn)!'']  (3.1) 

\n€Z  / 

The  underlying  abstract  concepts  were  introduced  in  the  fundamental 
paper  by  Duffin  and  Schaeffer  [3].  For  an  exposition  of  this  and  related 
material  in  the  context  of  nonharmonic  Fourier  series  see  also  the  monograph 
of  R.  Young  [24]. 

A  sequence  e„ ,  n  e  2  in  a  (separable)  Hilbert  space  Tl'  is  called  a  frame 
for  3i,  if  there  are  two  constants  A,  B  >0,  such  that  for  all  f  t  .'>f 

A||f|l5f  ^  B||f|b(  (3.2) 

n 

The  constants  A  and  B  are  called  the  frame  bounds.  From  (3.2)  follows  that 
f  is  uniquely  determined  by  the  frame  coefficients  (Cn,  f).  It  is  remarkable 
that  the  equivalence  of  norms  (3.2)  implies  a  simple  iterative  reconstruction 
method.  Define  the  frame  operator  S  by 


Sf  =  ^(en.f)e^ 


(3.3) 


{ 
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then  by  (3.2)  S  is  bounded  below  and  aboveand  consequently  S  is  invertible 
^in  th’.  Setting  a  =  2/ (A  +  B),  f  can  be  reconstructed  recursively  by 

OC 

4)0  =  aSf  t|)„  ,  1  =“  4>n  -  aS4)n  f  =  ^  4>n  13.4) 


n  0 


If  t„  =--  4>v,  is  the  result  after  n  iterations,  then 


4-  f, 


B 


B  +  A 


A 


14..)  I 


and  f  M  converges  to  fata  geometric  rate,  a  is  called  the  relaxation  parameter. 
The  precise  value  of  a  in  (3.4)  is  not  crucial,  the  algorithm  converges  for  all 
values  of  a  betweenOand  2  '!|S||.  Thechoice  a  =  2/(A  +  B)  yields  the  fa. .test 
convergence  and  the  best  numerical  results. 

In  [3],  Duffin  and  Schaeffer  give  conditions  under  which  the  sequence 
sine  lex  is  a  frame  for  Bf^,.  For  a  converse  see  (14],  Most  algorithn.is  that 
were  considered  in  engineering  are  reformulations  or  slight  modifications  of 
the  frame  n.ethod  [is,  18,  22,  23]. 


Theorems. 1.  If  for  some  constants  a,  D  and  v  <  I  the  sampling  set  x„,n  •  T 
satisfies  x„  -  x,„'  ^  a  >  Ofor  m  *  n  and 

[X„  -  Y  —  ;  t;  D,  n  (.-..(Vi 

u' 

then  sequence  Lx.,  sine  cox  is  a  frame  for  B;^,..  There  exist  A,  B  •  d  such  that 
for  all  f  c  Bf^, 

A  f  T  c  t(x„  I  "  B  f i.T7! 

T1  ►  C. 

Consequentlv  f  •  B‘,  can  be  reconstructed  bv  the  algorithm  |3.')I  with  the 
frame  operator 

Sf(xi  ^  fix,,  Isinc  co(x  x„)  (3.8i 


3.1.  Advantages 

•  The  sampling  rate  in  (3.01,  i.e.,  the  average  number  of  samples  in 
an  interval  of  length  1,  is  (yn'iel^'  and  thus  arbitrarily  close  to  the 
Nyquist  rate. 

•  The  sampling  set  may  be  fairly  irregular  and  may  have  gaps  of  a  fixed 
maximal  length  D,  provided  that  they  are  compensated  by  more  sam¬ 
ples  between  the  gaps. 
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■  The  sequence  L*,.  sine  tux  is  overcompiete  in  B^,  and  linearly  depen¬ 
dent.  The  practical  consequence  of  the  overcompleteness  cannot  be 
overestimated.  It  means  that  even  if  a  finite  number  of  samples  is 
missed,  the  signal  f  can  still  be  completely  recovered  (possibly  at  a 
slower  rate  of  convergence  of  the  algorithm).  It  is  this  property  which 
makes  "inexact"  frames  so  useful  and  often  preferable  to  orthogonal  or 
Riesz  bases. 

3.2.  Disadvantages 

•  Since  the  proof  in  [3]  uses  heavy  complex  analysis,  explicit  numerical 
estimates  for  the  frame  bounds  A  and  B  are  hard  to  come  by.  Therefore 
a  suitable  relaxation  parameter  has  to  be  determined  by  experiment. 
Thus  estimate  (3.5)  is  not  as  useful  as  it  looks  at  first  glance. 

•  The  sampling  set  is  still  just  a  perturbation  of  the  regular  sampling  set 

yrrn/  a’.n.  <  The  av’erage  number  of  samples  in  an  interval  of  length 

1  is  (y’^/  <-e)  ■ ' .  This  excludes  sampling  sets  with  local  variations  of  the 
density  and  thus  many  situations  of  practical  interest.  For  instance, 
in  sections  of  high  interest  one  nught  want  to  sample  at  10-fold  the 
Nyquist  rate,  in  sections  of  less  interest  just  above  the  Nyquist  rate.  Of 
ct)ur.se,  in  order  fosati.sfv  (3.6),  the  excess  .samples  could  be  dropped. 
But  in  most  applications  it  seems  quite  unreasonable  to  throw  away 
substantial  information  just  to  make  the  algorithm  converge.  Rather, 
one  would  look  for  a  better  algorithm. 

l.et  us  also  mention  that  from  numerical  experiments  it  is  known 
that  when  gaps  alternate  with  bunches  of  samples  in  accordance  with 
1 3.0 1,  then  the  algorithm  converges  rather  slowly. 

•  Since  the  functions  sequence  i  y,,  sinceex  are  linearly  dependent,  the 
interpolation  problem  i(x„  )  ■  A„  does  not  have  a  solution  f  ■  B‘,  tor 
all  sequences  A„  e  1'^. 

4.  A  new  algorithm  and  weighted  frames 

This  section  explains  a  new  algorithm  which  overcomes  most  of  the  ditlicul- 
ties  of  the  methods  in  Section  2  and  Section  3.  This  algorithm  emerges  as  a 
simplified  version  of  a  new  generation  of  reconstruction  algorithms  which 
were  explored  in  (4,  7,  6,  8,  .S|.  The  material  on  the  quantitative  theory  with 
sharp  estimates  is  taken  from  [llj. 

Let  the  sampling  set  x,, ,  n.  c  Z,  be  arranged  in  increasing  order,  ...  ■ . 

Xn  - 1  '  x„  x„  .  I  <  . . ..  Denote  the  midpoints  by  -  (x„  ,  i  f  x„  1/2 
and  set  Xn  ^  Xiw.,  i.w.. >•  Then  y,,  -  x„  <;  6/2,  Xn  -  yn-i  6/2  and 
iLn  -.cXn(x)  iforallx.  P  denotes  the  orthogonal  projection  from  I ‘(fH) 
onto  Bf„  and  is  defined  by  (Pf)  ^  xi  u.,u.i  f  ^  ^ 
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Theorem  4.1. 


Reconstruction;  It 

6  =  sup(Xn  1 1  -  x„)  <  ;f/a-,  (4.1) 

then  every  f  e  can  be  completely  reconstructed  from  the  sam¬ 
pling  values  f  (xn )  by  the  following  algorithm: 


4)c  =  P  (  21  I 

ct>kn  =  0k  -  P  4>k(x„lxn  j 


t  =  2l0'^ 

k  0 

where  all  sums  converge  in 

Rate  of  convergence:  Let  f„  =  ^  0k  be  the  resulting  approxima¬ 

tion  of  f  after  n  iterations  of  (4.3).  Then 


iK-fntU  (  — )  '  '  --4'-’  Ib-ji 

\  n  J  71  -  6ie 


From  the  estimates  in  the  proof,  we  obtain  the  following  important 
corollary  on  the  stability  of  the  algorithm  and  an  alternate  reconstruc¬ 
tion  method. 

Corollary  to  4.1.  If  w,,  denotes  the  weights  w,,  J  Xn(x)  dx  =  y„  .  i  --  y 
then 


2:  it(x„llV,;;  (l  , 


In  other  words,  the  sequence 


Avn  sinciu(x  -  x„  1  n  -  V. 


is  a  (weighted)  frame  for  with  the  explicit  frame  bounds  A  =  (1 
6a)/rr)^  and  B  =  (I  6a'/n|-^.  According  to  (3.4),  this  yields  the  following 
reconstruction  of  f.  Let 

(S»vf)(x)  =  -t- j  ^  f(x„  )w„  sincto(x  -  x„ )  (4.7) 
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be  the  weighted  frame  operator,  then 

•v 

4)0=Swf  4>k  U  =  4>k  -  Svv<h(:  f  =  ^  (J5k  (4-8j 

L  0 

Since  2/(A  +  B)  =  (1  +  =  (B-A)/(B+A)  =25av'(7T-k  6^ef/7T), 

and  B/A  =  (tt  +  6a')'^/(Tt  —  Scu)'^,  the  rate  of  convergence  of  f,,  =  cK 

to  f  is  as  in  (4.5) 


!f  - 


Y 


„  .  ,  (7t  +  6ai)- 
(tt  -  icol’ 


The  proofs  of  Theorem  4.1  and  its  corollary  are  given  in  the  final  section. 
Besides  the  general  advantages  and  disadvantages  of  the  frame  method 
which  were  discussed  in  the  previous  section,  we  would  like  to  emphasize 
the  following  peculiarities  of  the  new  method. 


4.1.  Advantages 

■  The  algorithm  converges  whenever  the  largest  gap  between  the  sam¬ 
ples  is  smaller  than  the  Nyquist  distance  tt /a'. 

■  Since  no  other  conditions  are  imposed,  the  sampling  set  may  be  Iruly 
random;  in  particular,  the  algorithm  handles  local  variations  of  the 
density  quite  well.  Because  of  the  use  of  the  weights  w,,  the  sequence 

w„  L,.  sine  lex  is  a  frame  in  many  situations,  where  Theorem  3.1  does 
not  apply,  for  instance,  when  there  is  no  positive  minimal  distance 
between  the  sampling  points. 

■  The  proof  is  much  simpler  than  for  the  theorems  of  Kadec  and  Duffin- 
Schaeffer.  All  constants  are  explicit  in  terms  of  the  maximal  gap  length 
and  the  size  of  the  spectrum.  The  explicit  calculation  of  the  frame 
bounds  and  of  the  rate  of  convergence  allows  the  number  of  iterations 
that  are  necessary  to  achieve  a  given  accuracy  to  be  determined  a  priori. 
For  instance,  in  order  to  obtain  an  accuracy  of  0. 1  'x  on  a  CD-player  with 
4-fold  oversampling  (iie.'Tr  1/4),  only  five  iterations  are  necessary. 

■  '-nnee  after  the  removal  of  a  finite  number  of  points  the  sampling  set 
still  satisfies  either  (4.1)  or  (3.6),  the  algorithm  provides  a  complete 
reconstruction  of  the  signal  even  when  a  finite  number  of  samples  are 
missing  or  lost. 


4.2.  Disadvantages 

Condition  (4.1 )  is  stronger  than  (3.6)  and  does  notallow  any  gaps  in  the  sam¬ 
pling  set.  If  gaps  do  occur,  the  frame  algorithm  is  still  applicable,  however 
at  the  cost  of  a  slower  rate  of  convergence. 
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Remark  4.2.  The  algorithms  of  Section  3  and  Section  4  have  been  imple¬ 
mented  and  tested  intensively  by  H.G.  Feichtinger  and  his  collaborators  at 
the  University  of  Vienna  [2,  9].  From  these  experiments,  it  became  clear 
that  the  best  currently  available  reconstruction  algorithm  is  (4.7)  and  (4.8), 
the  "adaptive  weights  method."  Since  it  is  much  easier  to  implement  than 
the  recursion  of  Theorem  4.1,  it  requires  less  time  for  the  same  number  of 
iterations. 

The  performance  of  the  ordinary  frame  method  gets  worse  with  in¬ 
creasing  randomness  of  the  sampling  set. 

Remark  4.3.  A  detailed  error  analysis  that  applies  for  an  large  class  of 
reconstruction  algorithms  will  appear  in  [5]. 

Remark  4.4.  Theorem  4.1  has  several  interesting  variations:  complete  re¬ 
construction  from  local  averages,  random  sampling  with  derivatives,  and 
higher  convergence  rate  through  smoother  approximation  operators.  We 
refer  to  (1 1]  for  detailed  statements. 


5.  Proof  of  Theorem  4.1  and  its  corollary 


For  the  proof  of  Theorem  4.1  we  need  the  following  well-known  inequalities. 

Lemma  5.1  (Wirtinger's  inequality).  If  f,  f'  €  L^(a,  b)  and  either  f(Q)  =  0 
or  f(b)  =  0,  then 


•b 

u 


|f(x)r  dx  ^  ^(b 


•b 

if'(x)|^dx 

<1 


(5.1) 


The  lemma  follows  from  [12,  p.  184),  by  a  change  of  variables.  We  use 
Wirtinger's  inequality  in  the  following  form;  If  f(c)  =  0  for  a  <  c  <  b,  then 

lf(x)|^  dx  $  max  ((c  -  q)'^.  (b  -  c)^)  [  lf'(x)|^dx  (5.2) 

which  follows  immediately  from  writing  jJ  and  applying  (5.1 )  to 

each  term. 


Lemma  5.2  (Bernstein's  inequality).  If  f  €  B^,  then  f'  e  B^,  and 

IK'll  $ 


(5.3) 


Proof  (of  Theorem  4.1).  Define 


Af  =  P 


(54) 
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It  is  easily  seen  that  A  is  a  bounded  linear  operator  from  into 
(see  also  (5.7)  below).  The  iteration  step  (4.3)  requires  an  estimate  on  ||f-Af|| 
for  f  €  B^.  By  writing  f  e  B^  as  f  =  Pf  =  P  ( fxn)  one  obtains 


||f-Af||^=  P(^(f-f(xn))xn 


^  Y  ((-<'(Xn))Xn 


^(f(x)  -  f(Xn))Xn  dx 


Since  the  Xn's  are  characteristic  functions  and  have  mutually  disjoint  sup¬ 
port,  the  last  expression  equals 


^  jf(x)  -  f)Xn  )l'^x»(x)  j  dx  =  ^ 

M  /  n 


|f(x)  -  f(Xn)r  dx  (5.6) 


Next  one  applies  VVirtinger's  inequality  (5.2)  to  each  term; 

|f(x)  -  f(Xn  dx 

•Jyt,  I 

4  ?  f-'  ’ 

-^max((x„ -y„_i)N(y„  -  x„)^)  if'(x)r  dx 

Ju..  . 


if'(x)l^  dx 


since  y„  -  x„  $  6/2  and  x„  -  y„_i  ^  6/2. 

Summing  over  n  and  using  Bernstein's  inequality,  one  obtains 


6^a'‘ 

^  -^11 ’ll 

71^ 


iif  -  Af||^  ^  ^lY  lf'(x)l^  dx  =  --^llf'll" 
^  „  Jy.,  I  ” 

Thus  we  have  obtained  the  basic  estimate 


||f- Af|K  —  ||f||  forallftBf,  (5.7) 

n 

This  means  that  for  6a'/7i  <  I  the  operator  A  is  invertible  on  Bf^,  with  the 


A-'  =  £](Id  -  A)" 


(5.8) 
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f  =  A-'Af  =  ^(Id-A)’'Af  (5.9) 

Tl  =  0 

Setting  (1)0  =  Af  and 

(t)„  =(ld-A)"Af  =  ([d-A)(Id-A)'-'Af 


=  4>u-l  —  A(t)r,_l 


(5.10) 


yields  the  algorithm  (4.2)-(4.4).  Since  the  start  of  the  iteration  is  (ho,  the 
reconstruction  indeed  contains  only  the  information  on  the  samples  f(Xn  )■ 
For  the  error  estimate  (4.5)  we  observe  that  with  (4.4),  (5.9)  and  (5.10) 

CC  OO 

f-fn=  Y.  (Id-A)‘‘Af  (5.11) 

k  n  +  1  k  n  1  1 

From  (5.7)  one  deduces 

||(Id-A)^Af||  $  (5.12) 

and 


IIAfIK  ||f||+||Af-f|K  1  + 


Combining  these  estimates  yields 

Ilf-Ms  t 

k  n  M 


in; 


1  + 


t)i 

6(0  \ 

n ; 


ll^ll 


(5.13) 


n  +  6a> 
n  —  6(0 


IKII 


(5.14) 


Proof  (of  Corollary  to  4.1).  The  upper  frame  bound  B  =  1  3  6(u/n  in  (4.6) 
follows  from  (5.13). 

For  the  lower  bound  A  =  1  —  6tu/n,  we  observe  that  by  (5.8)  and 
(5.7)  A~'  has  the  operator  norm  |iA"'||  $  (1  -  6to/n)“’.  The  equality 
II  Unci  )Xn  ||^  =  Z!n€i  follows  ftom  3  Simple  Computation 

similar  to  (5.6).  Altogether  we  obtain 

l|f||  =i|A-'Af||  ^  ||A-’i|  ||P||  ||^f(x„)Xn|| 

n£Z, 

and  everything  is  proved.  | 
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1  We  study  the  a.s.  convergence  in  the  irregular  sampling  theorem.  For 
bandlimited  processes,  we  obtain  necessary  and  sufficient  conditions  for 
exact  recovery  of  almost  all  sample  paths  of  the  signal.  No  stationarity 
assumption  is  needed,  but  a  spectral  representation  is. 


I.  Introduction 

The  sampling  theorem,  variously  attributed  in  the  regular  case  to  Ko- 
tel'nikov.  Shannon  and  Whittacker,  has  been  the  subject  of  numerous 
studies  with  both  theoretical  and  applied  flavors.  This  is  reflected  in  the 
extensive  bibliography  available  in  the  comprehensive  works  of  Jerri  [7]  and 
Higgins  [3].  However,  as  far  as  path  reconstruction  of  stochastic  processes 
is  concerned,  rather  few  results  are  available.  Under  the  assumption  of  a 
stochastic  model,  the  usual  approach  in  the  literature  is  to  obtain,  say,  mean 
square  reconstruction  which  follows  quite  directly  from  the  deterministic 
sampling  results.  When  a  path  of  a  process  is  observed  and  sampled, 
reconstruction  "on  average,"  i.e.,  in  mean  square,  might  be  inadequate  and 
path  reconstruction  has  to  be  considered. 

Only  a  few  papers  are  concerned  with  reconstructing  the  paths  of  a 
process.  Firstly,  Belayev  (Ij  obtained  exact  reconstruction  via  the  cardinal 
series  for  stationary  processes  under  a  guard  band  assumption,  i.e.,  the 
process  is  bandlimited  to  (-n,n)  while  the  sampling  rate  is  greater  than  l/n. 
Secondly  Piranashvili  [10],  still  under  a  guard  band  assumption,  extended 
Belayev's  result  to  include  some  classes  of  nonstationary  processes.  Thirdly, 
Caposhkin  [2]  gave,  as  a  consequence  to  a  general  theorem  on  the  almost 
sure  convergence  of  stochastic  integrals,  a  necessary  and  sufficient  condition 
for  path  reconstruction  of  stationary  processes  bandlimited  to  (— ti,  tt),  the 

t  Research  supported  by  ONR  Contracts  No.  N(XX)14-91-J-1003  and  N00014-89-C-0310. 

337 

J.  S.  Byrnes  et  at.  (eds.),  Probabilistic  and  Stochastic  Methods  in  Analysis,  with  Applications.  337-34 1 . 

©  1992  Kluwer  Academic  Publishers.  Printed  in  the  Netherlands. 


{  Houdre 


sampling  rate  being  I/ti.  Gaposhkin's  result  is  also  limited  to  uniform 
sampling  points  and  so  the  recovery  is  achieved  via  the  cardinal  series. 

In  the  work  presented  below,  both  the  assumptions  on  the  stationarity 
of  the  process  and  the  regularity  of  the  samples  are  relaxed,  and  no  mo¬ 
ment  condition  is  needed.  A  criterion  is  provided  for  the  path  recovery 
of  some  classes  of  nonstationary  bandlimited  processes  using  irregularly 
spaced  samples. 

2.  Preparation 

Let  (n, be  a  probability  space.  For  0  ^  a  $  2,  let  L“(n,B, TKl.^LP) 
for  short)  be  the  corresponding  space  of  complex-valued  random  variables 
equipped forO  <  a  $  2  withthe(quasi-)norms(£|  |“)’  whileon  L‘’(T),  the 
topology  is  the  one  induced  by  convergence  in  probability.  The  main  class 
of  processes  considered  here  have  a  spectral  representation,  namely,  Xt  = 
J,„  e'"^'  dZ(A),  t  €  IH,  where  the  random  measure,  Z  :  'B(iH)  ->  L“(J’),  0  $ 
a  $  2  is  CT-additive.  Using  the  terminology  of  [4,  3]  (where  the  reader 
can  find  more  details,  examples,  references)  these  processes  are  (bounded 
continuous)  (a, oo)-bounded.  Essential  to  our  approach  is  the  following 
result  (again  see  [4, 5}). 

Lemma  2.1.  Let  X  =  'X,li,c.j).  X,  e  L“(Q.B,J’),  be  (cx,oo1-bounded,  0  S; 
a  $  2,  with  random  measure  Zx-  Then,  there  exists  a  probability  space 
(O.B.T)  with  I  -^((P)  c  L^((P),  a  stationary  process  Y  =  Y,  ^  L"((P| 

and  a  random  variable  A  e  L'^“  such  that  Xt  =  APY|,  t  fB.  where 

P  is  the  orthogonal  projection  from  I  '^((P)  to  L^((P). 

In  Lemma  2.1  Zy,  the  random  measure  of  Y,  is  orthogonally  scattered; 
hence  there  exists  a  finite  positive  measure  F  such  that 

f  ^  r  ^ 

fdPZv  ^  llPil^C  fdZy  =  l!P||^  Ifl^dF 
!J?i  J-M 

for  all  f  t  L'^(F). 

Now  that  the  probability  material  has  been  given  let  us  state  another 
lemma  which  is  a  particular  case  of  a  beautiful  and  important  result  of 
Levinson  [8,  IVJ. 

Lemma  2.2.  Let  pc,  be  a  sequence  of  reals  such  thatsup^ltk  -  k|  <  1/4. 
Then  the  set  is  complete  in  L-^l-n.n)  and  there  exists  a  unique 

biorthogonal  set  (h(,lkpc.  C  L^l-Ti.n)  such  that,  for  any  g  c  L^l-rt.n), 
the  ordinary  Fourier  series  and  the  nonharmonic  Fourier 
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series  Hit(x)g(x)  dx  are  uniformly  equiconvergent  on  every 

compact  subset  of  (— 7t,  tt).  The  hk  are  given  by: 


M'k(t) 


'7X 

hk(x)e“’‘dx  = 

—  71 


G(t) 

G'(tk)(t-tk)’ 


where 


G(t)=(t-to)n 

n=  1 


t  €  93. 


As  mentioned  above.  Lemma  2.2  is  a  particular  case  (for  L^(-7T,7t)) 
of  convergence  results  for  non-harmonic  Fourier  series,  which  have  their 
origins  in  Paley-Wiener  [9].  The  bound  1/4  is  tight  and  is  a  sufficient  condi¬ 
tion  for  {e’^* Ikg::,  to  form  a  bounded  unconditional  basis  of  ( -n,  tt).  The 
functions  'Fk  are  called  Lagrangia  interpolating  functions  since 


H'k 


0, 

1. 


for  k  51^:  n 
for  k  =  n. 


When  tk  =  k,  G(t)  =  sin  nt/n  but,  in  general,  no  closed  form  expression  is 
available  for  G. 


3.  Reconstruction 


Throughout  the  next  two  sections,  by  a  process  X  =  we  mean 

that  X  is  (bounded  continuous)  («,oo)-bounded,  0  $  a  ^  2,  i.e.,  X(t)  = 
J.j,  e'^*  dZ(^),  t  €  93.  Furthermore,  X  is  said  to  be  bandlimited  to  (-tt.tt) 
(the  bounds  ±7T  are  just  chosen  for  notational  convenience)  if  Z  =  0  a.s.  7, 
outside  of  (-7t,7t). 


Theorem  3.1.  Let  X  be  bandlimited  to  (-tt  +  c.tt  -  e),  let  (tklkct  C  93  with 
supjtk-k|<  1/8,  and  let  G(t)  =  (t-t<.)nn  id  “  Ft'- 

X(t)  =  4  S.  7,  uniformly  on  every  compact  subset  of  93. 

When  the  dominating  measure  F  in  Lemma  2.1  has  some  degree  of 
smoothness,  reconstruction  is  also  always  possible. 


Theorem  3.2.  Let  X  be  second  order  stationary,  bandlimited  to  (-ti,ti).  If 
for  some  0  <  6  <  2, 

(log  log  -.j)  dF<-foo. 

Jo^  |A±7i|-'6  \  -  M  / 

Then, 


♦  00 

x(t)  =  Y. 


X(tk)G(t) 

G'(tk)(t-tk)’ 


—  00 


a  s.  7. 
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Although  reconstruction  in  L“(J’)  is  always  possible,  the  next  result 
shows  that  when  the  paths  are  concerned,  things  can  go  very  wrong. 

We  also  note  that  the  result  below  provides  a  bandlimited  stationary 
process  for  which 


lim  sup 

^  +  CX' 


y  X(t,)G(t) 
^  G'(tk)(t-tk) 


+00  a.s.  7. 


This  is  in  sharp  contrast  to  Theorem  3.1. 


Theorem  3.3.  Let  {onins:!  be  a  nondecreasing  sequence  of  positive  reals 
such  that 


1)  Qn  =  o(loglogn), 

2)  there  exists  C I  >  0  such  that  •  •  ^Cia2",  n  large  enough. 


Let  [tv;]  be  any  sampling  sequence.  Then,  there  exists  a  probability 
space  (Di.Si.Ti )  and  a  bandlimited  stationary  process  X  defined  on  this 
space  such  that  with  probability  one  (Ti ), 


lim  sup  — 

n  I  cc  Q  r 


L 


X(tUG(t) 
G'(tc)(i  -  tk) 


=  +00. 


This  work  is  still  in  progress.  Extensions  of  these  results,  including  the 
case  of  more  general  sampling  sequences,  will  be  published  in  [6]. 
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i  We  discuss  the  problem  of  estimating  the  spectral  density  function  of 
a  continuous  2-dimensional  stationary  random  field  when  the  data  set  is 
obtained  from  a  point  process  on  the  plane.  Second  order  spectral  analysis 
of  irregularly  spaced  data  on  a  real  line  was  first  considered  by  Shapiro 
and  Silverman  [20].  General  results  of  stationary  interval  functions  were 
studied  by  Brillinger  [5].  He  established  consistency  results  for  general 
polvspectral  estimates  of  the  stationary  interval  functions.  Consistency 
results  and  alias-free  sampling  schemes  for  the  second-('rder  case  in  the 
estimation  of  the  spectral  density  function  of  a  continuous  lime  series  ve  ere 
obtained  by  Masrv  [12,  11,  14|.  We  generalize  these  results  to  the  estimation 
of  spectral  density  functions  on  higher  dimensions.  Special  attention  is 
directed  to  the  2-dimensional  case  although  general  k-dimensional  cases  are 
considered  also.  Asymptotic  bias  and  covariances  are  studied.  In  particular, 
it  is  shown  explicitly  how  the  information  of  the  sampling  process  come  into 
play  in  obtaining  a  consistent  estimate  of  thedensity  function  of  a  continuous 
random  field.  Estimates  under  Poisson's  sampling  siheme  are  studied  in 
detail.  A  few  simulation  examples  are  given  as  illustrations. 


1.  Introduction 

Statistical  analysis  of  the  stationary  spatial  series  has  been  given  considerable 
attention  in  recent  years  with  applications  in  many  areas  (see  [8]  and  [19]). 
General  discussions  of  spectral  analysis  of  spatial  series  have  been  given  by 
Brillinger  [4]  and  Whittle  (22).  This  research  is  based  on  the  data  which  is 
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sampled  at  equally  spaced  grid.  In  most  applications,  observation  stations 
are  often  irregularly  spaced.  Masry  [15,  16]  introduced  random  acoustic 
arrays  placed  by  an  independent  sampling  from  a  given  multidimensional 
probability  density  function  to  estimate  the  frequency-wave  number  spec¬ 
trum  of  the  ambient  noise  field.  Inference  of  the  spectral  density  function 
of  a  2-dimensional  point  process  is  of  interest  also  [2,  9].  This  paper  is  con¬ 
cerned  with  the  estimation  of  the  spectral  density  function  of  a  continuous 
2-dimensional  stationary  random  field  which  is  sampled  by  an  independent 
2-dimensional  point  process. 

Spectral  analysis  of  irregularly  spaced  data  on  a  real  line  was  first  con¬ 
sidered  by  Shapiro  and  Silverman  [20].  Brillinger  [5]  studied  the  general 
results  of  stationary  interval  function.  Consistency  results  and  alias-free 
sampling  schemes  for  estimating  the  spectral  density  function  of  a  contin¬ 
uous  time  series  were  obtained  by  Masry  [11,  14,  15).  Masry  [13]  provided 
an  alternative  way  to  estimate  the  spectral  density  function  by  using  an  or¬ 
thogonal  series  method.  These  results  are  ba.sed  on  the  1 -dimensional  case. 
We  generalize  these  results  to  the  estimation  of  spectral  density  functions  on 
higher  dimensions.  Our  purpose  is  to  find  the  spectral  density  function  of 
a  stationary  spatial  process  Y  =  |Y(t),  t  €  R’’].  Special  attention  is  directed 
to  the  2-dimensional  case.  The  results  obtained  will  be  applicable  to  the 
analysis  of  data  recorded  at  points  in  some  region  of  a  surface.  This  kind  of 
data  can  be  found  in  optics,  forestry  and  geology. 

Let  V  'Y(t),  t  t:  R’’  be  a  stationary,  zero-mean  spatial  pro¬ 
cess  with  finite  fourth-order  moments,  continuous  covariance  function 
R\  (tl,  t  R'L  spectral  density  function  ct>\  (M,  -  R''  and  k'''-order  cumu- 

lants  Qy*"  (ui . lU-iLui . Uk-i  C  R'’;k  -  2 . where  R’’  denotes 

real  Luclidean  i>-space.  The  point  process  ,  t;,  t  R’’  is  stationary 

and  orderly,  independent  of  Y,  with  finite  fourth  order  moments  U  \  !  1  is 
the  counting  process  associated  with  :tv,  i,  then 


■  Lor  anv  positive  integer  k  and  anv  collection  . Bi,’  of  subsets 

in  K'L  with  B,  ;(xt . \,,l  :  cn,  -  x,  i;  b„,  •  R,  for 

i  1 . rl,  the  jiiint  distribution  of  the  random  \  ariables  NlBi 

h) . N(Bi.  •  K)1  is  independent  of  h  ■  R'’ 

.  P[N(lBi)  $  2]  -  o(!BI)as  IBIi  0. 

■  F|N’(B|i  oo,  for  all  bounded  B. 


The  results  are  true  for  B;  p  B’’  where  'B'’  is  the  Borel  sets  in  R'L  From  now 
on  we  will  concentrate  on  p  2  (i.e.,  in  the  2-dimensional  case).  Let  |3  be  the 
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mean  intensity  of  the  point  process  N(  ).  Then 


E[N(dt)l 

=  |3dt 

(1.1 

cov[N(dt),N(du)l 

=  Cwlduldt 

(1.2 

where  E  denotes  expectation  and  N(dt)  denotes  the  number  of  points  in  the 
region  (tx,tx  +  dt^j  x  (ty.ty  +  dtyl.  Cm  is  called  the  reduced  covariance 
measure;  Cn  has  an  atom  at  the  origin,  Cn(  101)  =  (3-  The  "sampled"  process 
is  taken  to  be 

Z(B)=^Y(Ti) 

T  i  6  H 


or  in  differential  form 


Z(dt)  =  Y(tlN(dt). 


An  estimate  of  the  spectral  density  function  of  the  random  field  Y(t )  is 
proposed  when  the  observed  process  Z(dt  |  and  the  sampling  process  N  (dt ) 
are  observed. 

If  N(  )  is  a  2-dimensional  Poisson  process  with  events  randomly  oc¬ 
curring  in  the  plane,  then  the  number  of  events  occurs  in  any  region  of  area 
A  has  the  Poisson  distribution,  with  mean  |3!Ai  and  the  events  in  the  non¬ 
overlapping  area  are  independent.  The  probability  density  function  of  N'(A) 
is  given  by 


P(N(A) 


n|  .-= 


n! 


n  =  0,1,2.... 


In  section  2,  second  order  cumulants  and  the  spectrum  of  a  continuous  2- 
dimensional  random  pri)cesses  Y(t|  are  introduced.  Its  relation  with  the 
sampled  process  and  the  sampling  point  process  is  derived.  Consistent 
estimates  of  2nd-order  spectrum  of  Y(tl  are  proposed  in  Section  3  based  on 
the  2-dimensional  Poisson  sampling  scheme.  A  simple  simulation  example 
is  presented  in  Section  4. 


2.  2nd  order  cumulant  and  spectrum 

Assumption  2.1.  The  process  Y  =  lY(t),  t  =  (t(l),t(2))  r  R^l,  is  a  contin¬ 
uous  2-dimensional  stationary  random  process  with  mean  my  0,  autoco¬ 
variance  function  Ry(u),  u  ^  (u(l),u(2))  c  R^,  and  has  finite  fourth  order 
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moments,  and  the  fourth  order  cumulants  Qy‘'’!ti,t2,t.?),ti,t2,t^  e  R^. 
The  2nd  order  cumulant  function  of  Y(t)  is 

Rv(u)  =Cov|Y(t),Y(t  +  ii)! 

=  Cum{Y(t),Y(t  +  u)l  (2.1) 

which  satisfies 

I  |u(i)||Ry(u(1),u(2))|du(l)du(2)  <  OO.  i  =  1.2. 

The  spectrum  of  Y  ( t )  is  given  by 

ct)Y(A)  =  f  Ry(u!c-"‘  Mu.  A  =  (A(1).A(2))  e  Rf  (2.2) 

(27t)^ 

We  are  concerned  with  the  estimation  of  <J)y(A),A  t  R^  given  random 
samples  of  Y(t),t  t  R^  from  a  2-dimensional  point  process  N(  ), 

Under  our  assumptions  in  the  introduction,  the  "sampled"  process  Z 
has  finite  fourth  order  moments.  In  particular 

EiZ(dt)i  =  F.lY(t)lEiN(dt)i  =  0 

and 

Uz(du)dt  -  E[Z(dt!Z(du)l  (2.3) 

-  E.'Y(t)N'(dt)Y(t+  u)\'(du)i 

==  Rv  (u);(3^du  4-  CN(du))dt  |2.4) 

so  that 

U2(B)=I  RY(u);|3‘du  +  CN(du);,  Bt'B^  (2.5) 

.  B 

is  a  u-finite  signed  measure  on  3^.  If  we  define  the  a-finite  measure 
Mn(B)-  [|3^du  4  CN(du)l.  BeB-' 

s  B 

then  we  can  rewrite  nz  ( B )  as 

^z(B)  -  RY(u)MN(du). 

Jb 
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If  the  covariance  density  function  cn  (u)  exists,  i.e., 
Cn{B)  =  36o(B)+ f  CN(u)du,  Be®^ 

Jb 


then 


^z(B)  =  3Ry(0)6o(B)  + 


Ry(u)[3^  +  CN(u)]du 


(2.6) 


(2.7) 


where 


c  fRx  f  >,  ifO€  B 

°  1 0,  otherwise. 

We  define  the  spectral  density  4)z(A)  of  the  "sampled"  process  Z  by 


(j)z(A)  = 


(27t)2 


e  '‘"''uz(du) 


(2.8) 


3^ct)Y(A)  + 


3Rv(0) 


+ 


I 


(27r)2  ^  (271)2  J 
If  we  assume  that  Cn  (u)  €  Li  with 

e~'“  ^CN(u)du 


RY(u)CN(u)e-'“Mu.  (2.9) 


i|)n(X)=  ’ 


(27t)^ 


then 


and 


cn(u)  =|il.N(A)e‘“^dA 
(^|RY(u)CN(u)e-‘“Mu 


1 


(27i)2 

1 


I 

(271)2  I 
(^11 


Ry(u)c-- e-‘'‘  '’cN(u)du 


RY(u)e-‘“>^-'’>e-“''' 


i|)N  (v)e’“''’dvdu 
J 

RY(u)e~'“  •^“'''du\J;N(v)dv 


4)y(A  -  v)iJ>N(v)dv. 

Hence  we  can  rewrite  (l»z(A)  as 

4>z(A)  =  3^4>y(A)  +  +|0Y(A-u)Tj^N(u)du.  (2.10) 
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Note  that  for  a  2-dimensional  Poisson  point  process,  Cn  (u)  =  0,  so  that 


(t)z(A)  =  3'^ 


<t)Y(M  + 


Ry(0) 
(2n)2  3 


and 


4>y(A)  =  - 


Ry(0) 

(271^3. 


However,  if  cn  (u)  ^0,  then  we  need  to  solve 


(2.11) 


4>z(^)  =  3^4>y(^)  + 


3Ry(0) 

{27T)^ 


-I- 


|<l)Y(A-u)3j^N(u)du. 


We  note  that  when  the  point  process  N(  )  is  Poisson  then  the  problem  is 
reduced  to  the  case  in  [15, 16]  with  uniform  density  function  and  ignoring 
the  time-frequency  part.  We  will  derive  the  following  proposition. 


Proposition  2.2.  If 


y(u)  = 


Cn(u) 

3^  -t-  Cn(u) 


t  L, 


with  Fourier  transform 
1 


r(M  = 


-•  i  u  X 


Y(ii)du  Li  , 


(2n)-^ 

then  (2.10)  can  be  inverted  and  we  have 


4)y(A)  =  ^ 


<t)z(A)  - 


3Ry(0) 

(2n)^ 


r(A-u) 


4)z(u)  - 


3Ry(0) 

(27I)^ 


du 


(2.12) 


(2.13) 


Proof.  From  (2.7)  and  (2.12)  we  can  obtain 

Uz(du)  =  3RY(0)6o(du)  +  Ry(u){3'^  4-  CN(u)ldu. 

RY(u)du  =  j--- Juz(du)  -  3RY(0)6c(du)l 

3^  +  Cn(u) 

=  ^[1  -y(u)l[uz(du)-3RY(0)6o(du)l. 
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Then  we  take  the  Fourier  transform  on  both  sides,  and  we  have 


1 


1 


RY(u)e-‘“^du 

1 


Hz(du) 


1 


(27t)2 


(27t)2 

I 

{2n]^ 


Y{u)e  '“'^uzldu) 
Y(u)(3Ry(0)e-‘“^6o(du) 


PRy(O) 


} 


and 


ct)v(A)  =  ^ 


ct)z(A) 


3Ry(0) 

(2n)^ 


r(A-u) 


<t)z(u)  - 


(3Ry(0) 

(271)'2 


(2.14) 


Note  that,  if  N(  )  is  a  2-dimensional  Poisson  point  process,  then  the  power 
spectrum  of  the  continuous  spatial  series  Y(t),  t  €  R^,  can  be  calculated  from 
the  power  spectrum  of  the  "sampled"  process  Z  by  substracting  a  constant 
and  multiplying  a  constant.  In  the  general  case  (2.14)  can  be  used  to  obtain 
estimate  of  4>y (A).  I 


3.  Estimation  of  2-dimensionai  power  spectrum 

For  simplicity,  we  assume  that  the  sampling  scheme  N  ( d t )  is  a  2-dimensional 
Poisson  process  which  is  independent  of  the  2-dimensional  continuous  ran¬ 
dom  process  Y  ( t ) ,  t  e  R^ .  If  the  continuous  random  process  Y  (t )  satisfies  the 
Assumption  2.1  and  a  realization  of  the  process  N()  at  ti  ,T2,  . . .  ,TN(r )  in  the 
region  =  (0,  T]  x  (0,  Tl,0  <  T  <  oo,  where  91^  is  an  expanding  subregion 
of  R^ .  The  observed  process  is  Z(dt)  =  Y(t)N(dt).  We  propose  the  following 
statistics  to  estimate  the  spectral  density  function  (J)y(A)  (see  (2.1 1 )) 
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3 

N(T) 

T2  ■ 

(3.1) 

•i'zIA) 

=  1  Wt(A 

—  u)It(u)  du. 

(3.2) 

(3.3) 

Ry(0) 

N(T)  • 

It(A) 

1 

N(T| 

Y_ 

k=1 

2 

(3.4) 

“  {2njy 

where  Wj  is  a  spectral  window  which  satisfies  certain  regularity  conditions 
as  Assumption  4.2  in  [5]  for  some  appropriate  bandwidth  By  and  is  given  by 


Wt(0)  = 


(Bt) 


rW 


(  —  \ 
\Bt/ 


9€ 


Lemma  3.1.  Let  N  ( • )  be  a  2-dimensionai  Poisson  process  with  mean  intensity 
3  which  is  a  process  of  events  randomly  occurring  in  the  plane  then 

•  For  any  bounded  region  A  of  area  IA|  the  number  of  events  in  that 
region  has  a  Poisson  distribution  with  mean  3|A|; 

■  The  number  of  events  in  nonoverlapping  regions  are  independent; 

and 

EN(A)  =  3IA|, 

Cum{N(A),N(A)}  =  3|A|. 

Cum{N(A).N(A).N(A)}  =  |3|A|. 
Cum{N(A),N(A),N(A).N(A)}  =  3IA|. 


Lemma  3.2.  Assume  t(j)RY(t(l),t(2))  €  Li,  j  =  1,2,  then 


E[It(M1  = 


&^4>y(A)  + 


PRv(O) 

(InV 


=  (t)z(A)  +  0 


(3.5) 


Proof.  See  Appendix  B.  | 
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Lemma  3.3.  Assume  |tj|*'QY‘''(ti  6  L),  tj  e  j  =  1,2,3,  k  =  0, 1. 

Then 

Cov{It(A(1),A(2)).It(h(1),h(2))} 

=  ^4)i(A(l),A(2)){AT(A(l)  +  n(1))Ar(A(2)  +  4(2)) 
+At(A(1)-4(1))At(A(2)-4(2))) 

+o(^)  (3.6) 

where 


Arlto) 


tu  e  R. 


Proof.  See  Appendix  B.  | 

From  Lemma  3.3,  we  see  that  Cov(It(A),  ItIf))  is  asymptotically  zero 
when  \x{^  0).  To  obtain  consistent  estimates  of  (j)Y(A)  we  need  to  obtain 
consistent  estimates  of  4>z(A)  in  (2.11).  One  can  use  the  usual  smoothing 
periodogram  given  in  (3.2).  Details  are  omitted  here.  Or  one  can  partition 
the  region  (0,  T]  (0,1)  into  m^(T) subregions  with  size  (0,  x  (0. 

each.  If  m(T)  — »  oo  and  — >  oo  when  T  — i  oo  then  the  average  of  m^(T ) 

periodograms  I_!_  is  a  consistent  estimate  of  4)z(A).  This  is  used  in  the 

HV  T  1 

following  example. 


4.  An  example 

Here  is  a  simple  2-dimensional  AR(2)  process  which  is  generated  with  a 
given  spectrum 

S(A,,A2)  =  (27t)-^|l  +0.2e-‘^'  +0.3e-’^'  -0.246"*'^' 

|Ail,|A2l  ^  n. 

A  plot  of  SAS  is  given  in  Figure  A.l.  There  are  different  ways  to  generate  a 
stationary  process  with  a  given  spectrum.  We  use  the  one  described  in  [7]. 
A  realization  Z(x,y )  at  (x,y )  €  R^  is  given  by 

Z(x,y )  =  v/2  ^(S(Ai,  Aj)  ■  AAj  ■  AAi)r  •  cos(x  ■  Aj  -I-  y  ■  Aj  -I-  Uj.j ) 
i.i 

where  Ai.A,  are  a  discretization  of  the  "support"  of  S(Ai,A2);  A., A-  are  a 
jittered  version  of  Ai,Aj  and  Ui  j  is  i.i.d.  uniform  in  (O.Zrr). 

A  Poisson  sample  is  obtained  on  (0,32)  x  (0,32)  with  N(32)  =  1024. 
The  periodogram  is  given  in  Figure  A.2.  The  average  of  10  periodograms 
from  10  independent  realizations  of  a  Poisson  sampling  process  is  given  in 
Figure  A. 3.  We  see  that  the  estimate  is  close  to  the  true  one. 
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Figure  A.2:  2-d  power  spectrum  using  Poisson  sampling  scheme 
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Figure  A.3;  2-d  smoothed  periodogram  using  Poisson  sampling 
scheme 
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B.  Proofs 


Proof  (of  Lemma  3.2). 


EflT(A)]  =E 


(2nT)-2 


N(l| 


—  ITV;  A 


k-  1 


(27tT)2 


e-“  ^Ztdt) 


(27rT)^ 


53;’ 


c‘'^^Z(ds) 


53; 


1 


(27tTHJ 


53; 


-  i(  t  —  s  i-X 


E(Z(dt)Z(ds)l 


53; 


where  =  (0,11  x  (0,T1.  Note  that 


EfZ(dt)Z(ds)l  =  E[Y(t)Y(s)N(dtlN(ds)l 

=  EfY(t)Y(s)lE[N(dt)N(ds)l 
=  Rv(t  -  s)l(i^dtd.':  +  (J6(t  —  s)dtdsl 

where  6(x)  is  the  2-dimensional  Dirac  della  function.  Thus  we  have 


E[h(A)l  = 


1 


(27rTH  J 


(27tT)^ 


53; 


e  ''Ryit  -  s)[3‘^dtds  3' |36(t  -  sldtdsl 

|3Ry(0! 


e-il'-'i  ^Ry(t  -sldtds  + 


53; 


(27T)^ 


where 


(27iT)^ 


p-i(t-s)  XR^(t_  sldtds  = 
J53-; 


(32  j-T  |.T  |.T 

(2nTH  Jo  Jo  Jc 


[  |)X(  1  l-i(tl2l-s(2))X(2l 

Jo 


RY(t(1)-s(l),t(2)  -s(2))dt(l)dt(2|ds(l)ds(2) 
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i2  fT  fT 


(27rT)2Jo  Jo 


--i(t(2|-s(2))A(2| 


g-i(t(I)-s(T|)A(t|p^(^,,^  -  s(l).t(2)  -  s(2))dt(l  )ds(l) 


dt(2)ds(2) 


i2  r'''  rT 


(2nT)2 


,-i(t(2)-s(2))M2) 


(1  -|u(l)|)e-‘“">^">RY(u(1).t(2)-s{2))du(J)  dt(2)ds(2) 


32  /-T  fT 


(T-|u(l)|)(T-|u(2)|)e 


(2nT)a 

xRY(u(l),u(2))du(1)du(2) 


3^  r  r  Wl)i|u(2)i 

(271)2  V  T  T  T2 


-iu(1)A(1)-iu(2l\(21 


,-iu|t)Mn-iu(2lA(21 


U  Fri  i-T 


(27t)2  j_T- 


Rv(u(»).u(2))du())du(2) 


-iu(IIA(ll-iu(2|A(2)R^(^j1)_^(2))du(l)dv 


^^e-'“'"^‘"“‘'*'^'^'^'RY(u(l),u(2))du(l)du(2) 
J-l J-T  ' 

!}i^e-‘'‘"'^'"-‘“''''^'^’Rv(u(l),u!2))du(l)du(2) 
.  -1  J-1  T 

'  J-T J-T 


RY(u(I).u(2))|du(l)du(2)|. 


|u(1)||u(2)|l 


Then  as  T  — >  oo  and  the  assumption  that  we  have  the  following  asymp¬ 
totic  results: 


'RY(t  sjdtds 
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=  E 


p— i(r-r*  ).A 


Z(dr)Z(dr*) 


"Z(ds)Z(ds’l 


f-l 


p-i|r-r*  )  A 


Rv(r-r’)E(N(dT)N(dr*l 


*‘RY(s-s*)E[N(ds)N(ds') 


J 


g-i(r-r  I  ARy(^  [p^drdr*  +  |36{r  -  r'jdrdr*] 


e  *‘Ry(s  -  s')  [li^sds*  +  (36(s  -  s'ldsds'i 

JcH-  Jw*  ' 


W; 


,  — i|r  — r'  I  A 


Rylr  -  r‘)drdr' 


e-‘''--‘’>‘RY(s-s*)dsds' 

'A3;  JtK; 


+  |3n^RY(0) 


'J3-;  J 


p—  i(  r—  r*  1  A 


Ry(t  -  r'ldrdr'  +  |3M^Ry(0) 


p-il: — s*  I'Ur 


93- 


+  T‘’|3^R^(0) 

where  6(u)  is  the  2-dimensional  Dirac  delta  function. 

Note  that 

EfZ(dr)Z(dr*)Z{ds)Z(ds*)| 

=  EfY(r)Y(T')Y(s)Y(s*)N(dr)N(dr*)N(ds)N(ds*)] 

=  EfY(r)Y(T*)Y(s)Y(s*)lEfN(dr)N(dr*)N(ds)N(ds*)l 
Because  the  mean  of  the  stationary  (process  is  zero,  we  have 
E(Y(r)Y(r-)Y(s)Y(s*)l 

=  QY‘‘'(3-s*,r*-s*,s-s*)-hRY(r-r*)RY(s-s*)  (B.l) 

-l-RY(r  -  sIRyIt*  -  s*)  +  Ry(t  -  s*)RY(r‘  -  s) 
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E[N(dr)N(dr’)N(ds)N(ds*)l  (B.2.1) 

=  Cum{N(dr),N(dr*),Nl(ds),N(ds*)]  (B.2.2) 

+3drCum{N(dr*),N(ds),N(ds*)!  (B.2.3) 

+  3dr*  Cum{N(dr),  N(ds),  N(ds’ )} 

+  |JdsCum{N(dr),N(dr*),N(ds')] 

+  Pds*Cum[N(dr),N(dr*).N(ds)l 

+  Cum{N(dr),N(dr*);Cum{N(ds),N(ds*)l  (B.2.41 

+  Cum{N(dT),N(ds)lCum'N(dr'),N(ds’)i  (B.2.5) 

+  Cum{N(dT),N{ds’)’jCum(N(dT*),N(ds)',  (B.2.6) 

+  |3^drdr*  Cum[N(ds),N(ds* )]  IB. 2. 7) 
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We  partition  (B.l )  into  four  parts.  First,  we  compute 
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Equation  (B.2.1 )  has  15  terms.  Computations  of  these  terms  are  very  similar. 
Here  we  consider  -  s*,r*  -  s’,s  -  s*)  multiplying  (B.2.2),  (B.2.3], 

(B.2.4),  (B.2.7)and  (B.2.12)  only. 
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The  result  of  part  2  is  0  ( =jJ7 )  uniformly  in  A  and  u-  For  the  third  part, 
we  compute 
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Here  we  still  have  15  terms,  we  compute  the  dominant  terms  only, the 
others  can  be  obtained  by  using  the  method  in  Part  2.  The  dominant  terms 
are{B.2.5),  (B.2.8),  (B.2.11),  (B,2.12). 
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The  proof  of  the  fourth  part  will  be  the  same  as  part  three,  we  will  find 
only  four  dominant  terms  which  are  (B.2.2),  (B.2.9),  (B.2.10),  (B.2.12).  The 
fourth  part  equals  to 
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i.  Automatic  target  recognition  in  images  from  sources  such  as  video  cam¬ 
eras,  infrared  cameras,  radar,  satellites,  etc.,  is  an  area  of  increasing  im¬ 
portance  and  growing  concern.  Pattern  recognition  in  images  largely  de¬ 
pends  on  two  important  factors:  (a)  a  good  feature  extraction  technique, 
and  (b)  good  feature  recognition  and  classification.  Many  image  pro¬ 
cessing  techniques  and  tools  are  available  today  to  improve  the  quality 
of  acquired  images  and  thereby  enhance  the  feature  extraction  process.  Fea¬ 
ture  extraction  depends  on  the  nature  of  the  image  data  and  application. 
Statistical  feature  extraction  methods  treat  patterns  as  points  in  a  multi¬ 
dimensional  measurement  space.  The  statistical  methods  normally  consider 
relationships — such  as  joint  probability  distribution,  interpoint  distances, 
and  scatter  matrices — to  define  patterns.  Tliere  is,  however,  another  ap¬ 
proach.  Recently  there  has  been  a  sudden  surge  in  research  activity  in  t!ie 
area  of  feature  detection  and  recognition  using  neural  networks  VVe  will 
discuss  the  application  of  two  interesting  neural  network  structures  based 
on  (a)  probabilistic  neural  networks  (PNNs)  and  (b)  self-organising  neu¬ 
ral  networks.  The  self-organising  neural  net  structure  is  built  on  adaptive 
resonance  theory  (ART),  as  propounded  bv  Crossberg  and  Carpenter  1 1 ). 


I.  Introduction 

Artificial  neural  networks  (ANN)  are  computational  models  built  around 
massively  parallel  interconnected  processing  elements,  as  in  biological  ner¬ 
vous  systems.  ANN  models  attempt  to. achieve  human-like  performance  in 
real  time.  There  has  been  a  sudden  surge  in  the  development  and  use  of 
ANN  models  to  solve  a  wide  variety  of  information  processing  problems, 
leading  to  the  emergence  of  a  fundamentally  new  and  different  approach  to 
information  processing  and,  hence,  computing.  This  new  approach — called 
neurocomputing — seems  to  be  the  alternative  to  "programmed  computing" 
which  has  dominated  information  processing  for  the  last  45  years.  Robert 
Hecht-Nielsen  defines  neurocomputing  as  the  new  technological  discipline 
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concerned  with  information  processing  systems  that  autonomously  develop 
operational  capabilities  in  adaptive  response  to  an  information  environment 
[5].  The  following  section  briefly  discusses  some  fundamental  aspects  of 
neurocomputing,  while  the  later  sections  discuss  the  applications  of  two 
ANN  models,  viz,  the  adaptive  resonance  theory  (ART)  based  structure  and 
the  probabilistic  neural  network  (PNN),  in  pattern  recognition. 

2.  Fundamentals  of  ANN 

The  fundamental  unit  of  the  complete  information  processing  system  in  our 
brain  is  the  neuron,  which  is  a  stand-alone  analogue  logical  processing  unit. 
Each  neuron  is  a  simple  microprocessing  unit  which  receives  and  combines 
signals  from  many  other  neurons  through  input  processes  called  dendrites. 
See  Figure  2.1  for  the  structure  of  a  neuron.  Signals  to  dendrites  are  commu¬ 
nicated  through  specialised  neuromuscular  junctions  called  synapses.  The 
input  signals  are  weighted  at  these  junctions.  The  synaptic  strength  deter¬ 
mines  the  weight.  The  input  signals  are  combined  at  the  cell  body  (nucleus). 

If  the  combined  signal  is  strong  enough,  it  activates  the  firing  of  the  neu¬ 
ron.  This  produces  an  output  signal  which  travels  along  a  long  transmission 
line  like  structure,  called  an  axon,  which  could  be  many  meters  long.  The 
information  transfer  is  chemical  in  nature  but  we  can  measure  the  effect  as 
an  electrical  potential.  This  chemical  is  sometimes  called  a  neurotransmitter 
and  is  released  whenever  the  connection  is  made.  The  synaptic  strengths  (as 
determined  by  the  amount  of  neurotransmitter  release)  are  what  is  modified 
when  the  brain  learns!  The  synapses  along  with  the  processing  information 
of  the  neuron  form  the  basic  memory  mechanism  of  the  brain.  The  brain  con¬ 
sists  of  tens  of  billions  of  neurons — all  interconnected  to  form  the  biological 
neural  network. 

The  neuron  therefore,  is  a  basic  computing  element  in  the  brain.  The 
neuron,  like  a  microprocessor,  receives  many  inputs,  weights  them  ,  combines 
them  and  finally  outputs  through  a  threshold  function. 


2.1.  Artificial  (electronic)  neurons 

In  an  artificial  neural  network,  the  unit  analogous  to  the  neuron  is  called  a 
processing  element  (electronic  neuron).  A  processing  element  (PE)  may  have 
many  inputs  and  only  one  output.  The  inputs  are  algebraically  summed. 
The  combined  input  is  then  modified  by  a  nonlinear  activation  function 
or  a  transfer  function.  The  activation  function  could  also  be  a  threshold 
function.  The  output  of  a  PE  can  be  connected  to  inputs  of  other  PE's 
through  connection  weights  (synaptic  strength).  Figure  2.2  illustrates  this 
basic  building  block  of  an  ANN. 
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Figure  2.1:  The  biological  neuron 


Figure  2.2:  An  artificial  neuron.  Sj  =  ^(VVjjXi  and  Yj  =  f(Sjj 
(activation  function). 
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If  there  are  N  inputs  to  a  neuron,  then  the  net  input  to  the  neuron  is 
given  by: 

N-l 

Si  =  ^WiiXi-0i  (2,1) 

i  0 

where 

S,;  the  net  input  to  j-th  neuron. 

Xj!  inputs  to  the  neuron. 

W(j:  connection  weights  between  the  i-th  input  and  i-th  neuron. 

Oji  the  bias  value  above  which  the  neuron  fires. 

The  output  of  the  neuron  is  a  function  of  a  nonlinear  activation  function 
A  common  activation  function  is  the  sigmoid  function.  The  output  Yj  of  the 
j-th  neuron  can  be  described  as: 


i2..1) 


OUTPUTS 


Figure  2.3:  An  AW  structure 
2.2.  ANN  structure 

A  iii'iinil  network  coasisls  of  many  interconnected  processing  elements 
The  I’lis  are  normally  organised  into  groups  called  Livers  or  sLihs  Theri>  are 
typically  two  layers  which  connect  an  ANN  to  the  outside  world.  An  Input 
layer  of  neurons  where  the  data  is  presented  to  the  network  as  input  and 
an  output  layer  of  neurons  which  hold  the  result.  F'igure  2  .3  llliistrales  an 
ANN  formed  using  the  I’l-s  as  basic  building  blocks  /\ny  layer(s)  of  neurons 
between  the  input  and  oiUpiit  slabs  is(are)  known  as  hidden  layer(s) 


for  a  sigmoid  function 
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2.3.  ANN  operation 

The  neural  network  operation  depends  largely  on  its  "learning"  process. 
Learning  is  the  process  of  adapting  or  modifying  the  connection  weights  in 
response  to  stimuli  being  presented  at  the  input  and  possibly  the  output. 

A  stimulus  at  the  output  corresponds  to  a  desired  output  or  response  to  a 
given  input,  in  which  case  the  learning  is  supervised.  If  no  desired  output 
is  shown,  the  learning  is  called  unsupervised  learning. 

In  summary,  neural  networks  in  general  are  nonprogrammed  adaptive 
systems  which  process  information  in  response  to  an  excitation.  Neural 
networks  are  normally  trained  to  respond  to  an  input  and  are  adaptive  or 
self-organising,  i.e.,  they  learn  to  solve  problems  purely  on  the  basis  of  the 
training  data  presented  to  them.  Several  neural  network  paradigms,  such 
as  multi-layer  perception,  Hopfield  net,  back  propagation,  Kohonen's  self- 
organising  neural  nets,  etc.,  have  been  proposed  to  solve  pattern  recognition 
and  other  signal  processing  problems. 

3.  Pattern  recognition 

Neural  networks  are  being  applied  to  process  a  wide  variety  of  sensor  data. 
The  sensor  may  be  a  microphone,  a  pair  of  electrodes,  a  radar,  a  TV  or  an 
infra-red  camera,  etc.  The  main  aim  of  processing  sensor  signals  is  to  extract 
information  about  the  signal  source  and/or  the  medium  through  which  the 
signals  have  travelled  before  detection  by  the  sensors.  One  aspect  t>f  sensor 
signal  processing  is  to  detect  and  recognise  certain  "known"  features  of  the 
signal.  This  process  is  often  termed  pattern  recognition.  Examples  include: 

•  recognising  a  particular  word  or  a  speaker  from  speech  signals; 

■  detecting  waveform  shapes  (ECC  waveforms); 

■  detecting  and  identifying  a  radar  signature  from  a  particular  aircraft, 

■  recognising  a  particular  class  of  ship  using  infra-red  images,  etc. 

The  recognition  process  may  be  illustrated  schemalicallv  as  stages  in  a 
signal  processor,  as  indicated  in  Table  3.1  The  functions  in  each  block  differ 
significantly  depending  on  the  type  of  sensors  used 
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3.1.  Pattern  recognition  in  images 

Pattern  recognition  in  images  largely  depends  on  two  important  factors: 

(a)  good  feature  extraction  techniques  and  (b)  feature  recognition  and  clas¬ 
sification.  A  number  of  image  processing  techniques  and  tools  are  available 
today  to  improve  the  quality  of  acquired  images  and  thereby  enhance  the  fea¬ 
ture  extraction  process.  The  feature  extraction  process  depends  on  the  nature 
of  the  image  data  and  application.  Conventional  classifiers  treat  patterns  as 
points  in  a  multi-dimensional  measurement  space,  using  relationships  such 
as  joint  probability  distribution,  interpoint  distances  and  scatter  matrices  to 
define  classes.  Neural  networks  have  been  proved  to  be  suitable  for  solving 
pattern  recognition  problems.  The  following  sections  briefly  describe  the  ap¬ 
plication  of  two  different  neural  network  structures  to  pattern  recognition  of 
infrared  images  (IR)  of  ships.  One  of  the  neural  networks  is  a  self-organising 
type  while  the  other  is  a  probabilistic  neural  network  based  on  the  Bayesian 
classifier.  The  self-organising  neural  network  considered  here  is  based  on 
adaptive  resonance  theory  (ART)  proposed  by  Grossberg  [3,  4].  According 
to  Grossberg,  the  adaptive  resonance  architectures  are  neural  networks  that 
self-organise  stable  recognition  codes  in  real  time  in  response  to  arbitrary 
sequences  of  input  patterns.  The  probabilistic  neural  networks  (PNN)  for 
classification  was  introduced  by  Specht  [8]. 

4.  Probabilistic  neural  network  (PNN) 

The  PNN  is  a  neural  network  implementation  of  the  Bayesian  classifier  and 
provides  a  general  structure  for  solving  pattern  recognition  problems.  The 
PNN  is  basically  a  three-layer  feed-forward  network  that  uses  the  sums  of 
Gaussian  distributions  to  estimate  the  probability  density  functions  (PDF)  for 
various  classes  as  learned  from  training  data  sets.  Although  the  PNN  struc¬ 
ture  resembles  the  back  propagation  network  (BPN),  the  activation  function 
of  its  processing  elements  is  different  from  that  of  BPNs.  In  the  processing 
elements  of  PNN,  the  commonly  used  sigmoidal  activation  is  replaced  by 
one  of  a  class  of  exponential  functions.  The  PNN  provides  probability  and 
reliability  measures  for  each  of  its  classifications.  A  generalised  structure  of 
the  P.N.N  is  illustrated  in  Figure  4.1.  The  neurons  in  the  input  layer  simply 
span  out  the  input  data  to  neurons  in  the  pattern  layer  where  the  data  are 
weighted,  summed  and  passed  through  an  exponential  activation  function, 
as  shown  in  Figure  4.2. 

There  arc  as  many  pattern  units  for  each  class  as  there  are  training 
vectors  for  each  class 
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Figure  4.1:  Generalised  PN'N  structure 


Figure  4.2:  Pattern  unit  in  PW 


5.  Parzen  estimator 

Par/en  estimation  is  used  to  build  the  PDF  over  the  feature  space  for  each 
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category.  The  Parzen  estimator  used  in  the  PNN  is  given  below. 


fA(X) 


1  1 

(27t)'’'-2crP  N 


N 

^PK(X) 
>  1 


(5.1) 


where 

PK(X)  =  Parzen  kernel  =  exp(-(X  -  X„i )'' (X  -  X,.;  )/2ct^ )  (5.2) 


and 

fA(X);  Probability  density  function  for  class  A. 

X:  Input  pattern  vector  of  dimension  p. 

X„i:  I-th  training  pattern  from  class  A. 
cr:  smoothing  parameter. 

N;  total  number  of  training  patterns. 

T :  represents  transpose. 

The  equation  (5.2)  can  be  simplified  as: 

PK(X)  =exp(X''X<.i  -  ll/a^l  (sinceX^X=1)  (5..3) 

The  term  X^X„,  is  the  dot  product  of  the  feature  vector  to  be  classified 
with  a  training  vector.  If  the  input  paths  to  a  processing  element  in  the 
pattern  layer  have  their  weights  .set  to  the  training  vector,  then  the  standard 
summation  produces  that  dot  product.  If  the  acti\alion  function  of  the 
processing  element  is  of  the  form 

exp(Z  Do"’  (5.4 ) 

where 

Z  -  X  W  (5..5) 

and  W  is  the  training  vector,  the  proce.ssing  element  then  implements  the 
Parzen  kernel  as  in  (5..?).  The  summation  layer  of  the  PNN  sums  the  Parzen 
kernels  for  each  class,  and  the  output  layer  finally  chooses  the  class  with 
the  largest  PDF  to  the  input.  The  output  layer  also  includes  v\’eighting  to 
implement  the  a  priori  class  probabilities,  thus  providing  the  full  Bayesian 
classification  proce.ss. 

Normally,  the  PNN  calculates 

I  ^ 

fA(X)  |sj 

I  I 

when  each  class  has  the  same  number  of  training  examples.  The  fixed  term 
in  (5.1 )  can  be  set  to  have  any  value  for  each  category  for  scaling  purposes. 
Some  commercially  available  ANN  software  normally  implement  (5.6)  (6|. 
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The  accuracy  of  PNN  depends  on  the  smoothing  factor  used  in  the 
estimation  of  Parzen's  PDF.  The  PNN  is  trained  by  placing  the  training 
samples  for  each  class  directly  in  memory  and  then  testing  the  network 
on  the  testing  samples  at  different  values  of  smoothing  factor  a.  Training 
involves  finding  an  optimal  value  for  the  smoothing  factor  which  gives  the 
peak  accuracy. 

6.  ART  neural  network 

The  ART  architecture  is  highly  adaptive  and  evolved  from  the  simpler  adap¬ 
tive  pattern  recognition  networks  known  as  the  competitive  learning  models. 
Figure  6.1  illustrates  the  ART  str-'rture  schematically  [2]. 


ATTENTIONAL  SYSTEM 


Figure  6.1:  ART  structure 

There  are  two  clas.ses  of  ART  structures:  ARTl  handles  binary  input 
patterns  while  ART2  can  proce.ss  both  binary  and  analog  patterns.  The  ART 
system  basically  consists  of  two  layers  of  neurons  called  1-1  and  F2.  The 
input  sequence  activates  the  neurons  in  the  FI  layer  and  the  activity  passes 
through  synaptic  connections  (weights)  to  neurons  in  the  F2  lave  where 
they  compete  with  each  other  and  finally  one  node  fires  (winner-take-all) 
The  FI  layer  is  known  as  the  feature  detection  Liver  and  each  neuron  in  the 
F'2  layer  represents  a  different  "category."  Fach  neuron  in  FI  is  connected 
to  every  neuron  in  F-2  by  a  bottom-up  pathway  and  similarly  a  top-down 
pathway  exists  between  neurons  in  the  F2  and  FI  layers.  The  activated  node 
in  the  f'2  layer  reinforces  activity  in  the  FI  layer  through  top-down  priming. 
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The  neurons  in  FI  therefore  receive  two  inputs.  The  flow  of  bottom-up  and 
top-down  information  leads  to  a  resonance  in  neural  activity.  If  a  particular 
feature  is  present  in  both  the  bottom-up  and  the  top-down  signals,  then  a 
reinforcement  of  FI  occurs.  The  attentional  gain  control  system  regulates 
the  top-down  signal  when  there  is  no  bottom-up  activity  by  setting  the 
gain  low  and  thus  stopping  the  resonance  when  there  is  no  input.  The 
orienting  system  generates  a  reset  signal  to  F2  whenever  the  input  pattern  to 
FI  is  considerably  different  from  the  top-down  information.  The  orienting 
system  or  the  novelty  detector  receives  an  inhibitory  input  corresponding 
to  the  overall  activity  of  FI  and  an  excitatory  input  from  the  input  pattern, 
thus  keeping  a  vigilance  on  the  input  pattern  and  the  categories.  When  a 
novel  pattern  is  detected,  a  reset  signal  is  sent  to  F2  shutting  off  the  active 
neurons  and  the  network  hunts  for  another  neuron  to  be  active  through  a 
new  resonance.  When  a  new  neuron  in  F2  becomes  active,  then  that  node 
codes  the  input  pattern.  The  detailed  description  of  ART  structure  is  well 
documented  by  Carpenter  and  Grossberg  [2, 1). 

7.  Application  of  ART  and  PNN  in  pattern  recognition 

Infra-red  (IR)  images  of  navy  vessels  have  been  used  to  test  the  relative  effi¬ 
cacies  of  ART  and  PNN  in  pattern  recognition.  Target  images  were  obtained 
by  a  thermal  imager  operating  in  the  8~12n  region,  recorded  using  a  standard 
videotape  recorder,  and  then  digitised  using  a  frame  grabber  to  a  resolution 
of  512  •  512  pixels  with  8  bits  per  pixel  depth. 


7.1.  Image  processing 

An  edge-based  image  processing  technique  to  localise  and  segment  the  target 
from  its  immediate  surrounding  background  has  been  established  in  the 
Guided  Weapons  Division.  The  technique  prefilters  the  IR  image  with  the 
standard  3  ■  3  median  filter  and  then  processes  the  image  with  the  Prewitt 
5  ■  5  edge  detector.  The  search  for  the  targets  is  conducted  in  those  regions 
of  image  that  have  relatively  high  edge  strengths.  Isothermal  contours  are 
extracted  from  areas  of  interest  by  an  adaptive  contour  tracing  algorithm  at 
a  number  of  thresholds.  The  details  of  image  processing  steps  for  IR  images 
of  ships  can  be  found  elsewhere  [7]. 

7.2.  Feature  vector  extraction 

Two  methods  have  been  investigated  to  extract  feature  vectors.  The  fast 
Fourier  transform  (FFT)  and  the  ID  Hadamard  transform  (f  lT)  have  been 
applied  to  extract  the  feature  vectors  from  preprocessed  IR  images.  The  FFTs 
of  the  top  profiles  of  ships  (above  the  water  line  boundary)  are  computed 
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and  the  amplitudes  of  the  first  12  Fourier  coefficients  are  used  as  the  input 
feature  vectors  to  the  Neural  Network  classifiers. 

The  ID  HT  is  applied  to  the  whole  of  an  object  along  its  major  axis 
(normally  the  horizontal  axis).  This  process  is  suitable  for  images  in  which 
reflection  of  the  vessel  radiation  from  the  sea  surface  is  not  a  significant 
consideration.  Objects  are  extracted  using  a  local  thresholding  technique 
and  the  HT  is  applied  between  object  extremities  detected  using  a  ID  Marr- 
Hildreth  vertical  edge  detector.  The  HT  is  generated  by  sampling  the  area 
of  the  thresholded  object  through  a  series  of  binary  masks  generated  from 
Walsh  functions  [9].  Feature  vectors  in  a  form  suitable  for  input  to  the  ART2 
and  PNN  are  obtained  by  normalising  the  8  lowest  order  components  to 
remove  scale  dependence  and  to  retain  a  representation  of  their  polarity. 


7.3.  Training  of  ART2  and  PNN 

Two  sets  of  feature  vectors  were  extracted  using  the  FFT  and  ID  HT,  as 
described  above,  from  80 IR  images  of  ships  belonging  to  4  different  classes. 
Each  class  of  ship  had  20  IR  images  obtained  at  various  ranges.  In  each 
class,  10  images  were  used  as  training  data  and  the  remaining  10  as  test  data. 
The  ART2  network  was  trained  in  an  unsupervised  incremental  learning 
mode  on  a  train;,  ig  set  containing  10  feature  vectors  of  each  of  4  classes  of 
ships.  Thua,  tne  ART2  network  was  trained  separately  on  FFT  and  1 D  HT 
feature  vectors. 

The  PNN  was  trained  by  placing  the  training  vectors  for  each  class  di¬ 
rectly  in  memory  and  then  testing  the  network  on  the  test  vectors  at  different 
values  of  the  smoothing  factor. 


7.4.  Recognition  of  ships  by  ART2  and  PNN 

We  have  investigated  the  efficacy  of  the  two  neural  network  structures  in 
classifying  IR  images  of  ships  belonging  to  4  classes.  We  have  also  investi¬ 
gated  two  techniques  (FFT  and  1 D  HT)  to  extract  feature  vectors  from  the 
raw  IR  images.  Both  ART2  and  PNN  were  presented  with  feature  vectors 
from  the  test  data  set.  Figure  7.1  shows  the  confusion  matrices  yielded  by 
the  ART2  in  classifying  the  test  data  set  (feature  vectors  were  extracted  using 
both  the  FFT  and  the  ID  HT).  The  ART2  results  have  been  pre.sented  at  the 
Second  Australian  Conference  on  Neural  Networks  (7).  Refinements  of  the 
training  technique  have  resulted  in  a  significant  improvement  in  the  results 
from  those  reported  earlier.  It  is  obvious  from  the  matrices  that  the  ART2 
Neural  Net  A^as  able  to  classify  the  feature  vectors  obtained  by  the  1 D  HT 
far  better  than  it  could  those  obtained  by  the  FFT.  The  ART2  classified  97.5% 
of  ID  HT  data  with  100%  classification  accuracy. 
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Table  7.1:  Using  FFT  gener-  Table  7.2;  Using  ID  HT  gen- 
ated  feature  vectors.  erated  feature  vectors. 

Figure  7.1:  Confusion  matrices  for  ART2  pattern  classification. 


The  PNN  was  tested  using  the  same  set  of  feature  vectors  as  that  of 
the  ART2.  The  work  on  the  PNN  was  carried  out  by  the  Department  of 
Electrical  and  Electronic  Engineering  at  the  University  of  Western  Australia 
through  a  research  contract  with  the  Guided  Weapons  Division  for  the  pur¬ 
pose  of  comparing  the  performances  of  ART2  and  PNN.  Figure  7.2  shows 
the  confusion  matrix  produced  by  the  PNN  in  classifying  4  ship  classes  [10]. 
It  is  interesting  to  note  that  the  PNN  has  also  classified  97.5‘7f  (ID  HT)  of 
data  correctly.  Although  the  classification  results  of  the  PNN  appear  to  be 
the  same  as  those  of  ART2,  the  PNN  has  put  some  ship  classes  into  wrong 
bins — thereby  producing  confusion.  The  ART2  did  not  cause  such  confusion 
(for  1 D  HT). 
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Table  7.3:  Using  FFT  gener-  Table  7.4:  Using  ID  HT  gen- 

ated  feature  vectors.  erated  feature  vectors. 

Figure  7.2:  Confusion  matrices  for  PNN  pattern  classification. 

8.  Conclusions 

A  brief  introduction  to  Artificial  Neural  Networks  has  been  presented.  The 
application  of  ANNs  in  pattern  recognition  has  been  examined.  Two  interest¬ 
ing  Neural  Network  paradigms  (ART2  and  PNN)  have  been  investigated, 
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yielding  interesting  results.  The  performances  of  the  PNN  and  the  ART2 
have  been  compared.  The  PNN  achieved  classification  accuracy  similar  to 
ART2's,  but  the  ART2  produces  less  or  no  confusion  compared  to  the  PNN. 

In  the  case  of  the  ID  Hadamard  transform,  the  ART2  produced  no  confusion 
at  all  among  the  4  ship  classes.  ART2  placed  unclassified  inputs  into  an  un¬ 
known  class,  thus  creating  a  new  class.  This  is  a  desirable  feature.  The  PNN, 
on  the  other  hand,  was  insensitive  to  the  value  of  sigma  meaning  that  the 
classes  are  well  separated.  The  data  sets  used  are  insufficient  to  draw  defini¬ 
tive  conclusions.  However,  the  preliminary  investigation  has  established  the 
potential  application  of  the  ART2  and  the  PNN  to  pattern  recognition  of  IR 
images  of  ships. 
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The  digital  computer  (s  extremely  effective  at  producing  precise  aimeers 
to  loett-defincd  questions.  The  neri>ous  system  accepts  fuzzy,  poorly 
conditioned  input,  performs  a  computation  that  is  ill-defined,  and  pro¬ 
duces  approximate  output.  The  systems  are  thus  different  in  essential 
and  fundamentally  irreconcilable  loays.  Our  struggles  with  digital  com¬ 
puters  have  taught  us  much  about  berw  neural  computation  is  not  done: 
unfortunately,  they  luwe  taught  us  relatively  little  about  how  it  is  done. 
Part  of  the  reason  for  this  failure  is  that  a  large  proportion  of  neural 
computation  is  done  in  an  analog  rather  than  in  a  digital  manner. 

Can’er  A.  Mead  (19S9) 

The  actions  of  digital  computers  thrmseli'es  depend  vitally  upon  quan¬ 
tum  effects — effects  that  are  not,  in  my  opinion,  entirely  free  of  diffi¬ 
culties  inherent  in  the  quantum  theory.  What  is  this  'vital'  quantum 
dependence?  In  modern  electronic  computers,  the  existence  of  discrete 
states  is  needed  (say,  coding  the  digits  0  and  1),  so  that  it  becomes  a 
clear-cut  matter  when  the  computer  is  in  one  of  tlwse  states  and  when 
in  another.  This  is  the  z’ery  essence  of  the  'digital'  nature  of  coni  pul  er 
operation.  This  discreteness  depends  ultimately  on  quantum  mechanics. 

Roger  Penrose  (1989) 


1.  Overview 

The  development  of  faster  and  more  efficient  computers  in  recent  years  has 
been  driven  by  a  seemingly  unending  thirst  for  communication,  interaclK'ii, 
automation,  control  issues,  information  availability,  and  a  yearning  for  new- 
understanding  of  the  self-organization  principles  of  ourselves  and  out  en¬ 
vironment.  The  challenges  of  the  future  force  us  to  cr<\ite  and  study  new- 
concepts  of  adaptive  information  processing  and  to  implement  for  faster 
communication  novel  computer  architectures  based  on  fundamental  quan¬ 
tum  theoretical  principles. 

Until  now  the  increased  power  has  been  driven  largely  by  contin¬ 
ued  refinements  to  microelectronic  fabrication  techniques,  such  as  electronic 
switches  (miniaturized  transistors)  with  higher  switching  speeds  and  asso¬ 
ciated  integrated  circuits  (ICs)  with  increased  levels  of  integration  on  silicon 
chips.  Although  the  advancements  in  the  1C  hardwiring  and  packaging 
functions  have  been  significant,  their  prospect  for  continuing  at  the  same 
steady  rates  from  very  large  scale  integration  (VLSI)  to  ultra  large  scale  in¬ 
tegration  (ULSl)  are  being  dimmed  by  physical  limitations  associated  wiih 
further  miniaturization.  Limitations  of  electronics  include: 

•  electromagnetic  interference  at  high  speed, 

■  distorted  edge  transitions. 
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•  complexity  of  metal  connections, 

■  drive  requirements  for  pins, 

■  large  peak  power  levels, 

•  impedance  matching  effects. 

Electromagnetic  interference  arises  because  the  inductances  of  two  cur¬ 
rent  carrying  wires  are  coupled.  Sharp  edge  transitions  must  be  maintained 
for  proper  switching,  but  higher  frequencies  are  attenuated  greater  than 
lower  frequencies,  resulting  in  sloppy  edges  at  high  speeds.  The  coniplexity 
of  metal  connections  on  chips,  on  circuit  boards,  and  between  system  com¬ 
ponents  affects  interconnection  topology  and  introduces  fields  and  unequal 
path  lengths.  This  translates  to  signal  skews  that  are  overcome  by  slowing 
the  system  clock  rate  so  that  signals  overlap  sufficiently  in  time.  Large  peak 
power  levels  are  needed  to  overcome  residual  capacitances,  and  impedance 
matching  effects  at  connections  require  high  currents  which  result  in  lower 
system  speeds.  Even  if  much  smaller  logic  gates  are  produced  by  utilizing 
new  techniques  such  as  X-ray  lithography  yielding  an  order-of-magnilude 
reduction  in  feature  size,  the  speed  of  the  1C  will  be  limited  by  the  intercon¬ 
nection  delays  between  transistors.  Unlike  transistor  response  limes,  these 
time  lags  are  reluctant  to  scale  down  with  size.  Asa  result  there  is  a  10'  factor 
disparity  between  the  speed  of  the  fastest  electronic  switching  compoi\ents. 
presently  transistors  that  can  switch  in  5ps,  and  the  clock  rates  of  5  ns  of  the 
fastest  digital  electronic  computers. 

To  ensure  further  progress,  it  is  prudent  not  to  rely  upon  continued 
refinements  to  hardware  and  software  implementations.  In  fact,  computer 
architects  are  turning  to  the  design  of  parallel  processors  to  continue  the 
drive  toward  faster  and  more  powerful  computers.  The  massively  parallel 
organization  principles  which  distinguish  analog  neural  systems  from  the 
small  scale  interconnection  architectures  of  special-purpose  parallel  elec¬ 
tronic  processors  and  even  more  from  the  von  Neumann  architecture  of 
standard  digital  computer  hardware  are  one  of  the  main  reasons  for  the 
largely  emerging  interest  in  neurocomputer  science.  Since  presently  some 
areas  of  microelectronics  are  approaching  their  natural  physical  limitations, 
it  is  necessary  to  examine  other  technologies  that  may  offer  denser,  faster 
communication  between  chips  or  logic  gate  arrays  or  even  piovide  alterna¬ 
tives  to  the  gates  themselves.  If  light  could  be  used  to  transfer  data  between 
chips  or  gates,  the  interconnection-delay  problems  of  electronics  would  be 
avoided  and  the  communication  would  occur  at  the  speed  of  light  itself. 

The  system  of  linear  interconnects  by  which  nonlinear  processing  ele¬ 
ments  can  share  information  among  themselves  is  the  most  important  com¬ 
ponent  of  any  parallel  computer.  Just  as  photonics  is  becoming  the  tech¬ 
nology  of  the  future  for  telecommunication  and  machine-to-machine  inter¬ 
connects,  it  also  penetrates  computer  hardware  and  affects  communications 
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within  a  single  computer,  especially  for  processor-to-processor  interconnects 
and  board-to-board  interconnects  of  parallel  computer  architectures,  and 
even  more  of  neurocomputer  architectures.  Not  only  does  coherent  light 
emitted  by  a  laser  have  a  much  higher  information  capacity  than  electrical 
hardwires,  but  due  to  their  boson  character  optical  photons  do  not  interfere 
during  free-space  propagation.  Thus  optical  beams  can  occupy  the  same 
region  of  space  without  mutual  interaction,  allowing  data  streams  to  pass 
through  one  another  without  crosstalk  and  quantum  interference,  hence  al¬ 
lowing  multiple  signals  to  travel  the  same  channel  in  parallel.  Indeed,  a 
good  lens  can  image  tens  of  thousands  of  fully  resolved  points  from  one 
plane  to  another,  each  of  these  parallel  channels  having  a  theoretical  band¬ 
width  far  in  excess  of  1  THz,  Thus  a  single  lens  could  easily  carry  all  the 
telephone  conversations  .simultaneously  going  on  in  the  world  at  any  mo¬ 
ment  in  time.  In  this  way,  holographic  optical  interconnect  technology  leads 
to  a  very  high  packing  density,  to  a  simplified  connection  complexity,  and  to 
reduced  drive  requirements. 

In  summary  of  these  arguments,  optical  technology  includes  the  fol¬ 
lowing  advantages; 

•  high  connectivity  through  coherent  imaging, 

•  no  physical  contact  for  interconnects, 

•  non-interference  of  free-space  propagating  signals, 

■  high  spatial  and  temporal  bandwidth, 

•  no  feedback  to  the  power  source, 

•  inherently  low  signal  dispersion. 

High  bandwidth  is  achieved  in  space  because  of  the  non-interference  of 
optical  signals,  and  high  bandwidth  is  achieved  in  time  because  propagating 
wavefronts  do  not  mutually  interact.  There  is  no  feedback  to  the  power 
source  as  in  electronics,  so  that  there  are  no  data  dependent  loads.  Finally, 
inherently  low  signal  dispersion  means  that  the  shape  of  a  pulse  as  it  leaves 
its  source  is  virtually  unchanged  when  it  reaches  its  destination. 

At  the  interchip  level  of  the  interconnection  hierarchy,  two  types  of 
holographic  optical  interconnects  are  available:  free-space  and  guided-wave 
optical  technology.  In  the  free-space  type,  a  large  array  of  optical  signal 
beams  emitted  from  a  light  source  is  distributed  by  imaging  it  to  a  planar 
array  of  optical  detectors  using  a  holographic  optical  element  (HOE)  as  a 
flat  ligbf  diffracting  device.  This  type  of  holographic  optical  interconnect  is 
three-dimensional  and  provides  flexible  implementation  of  wiring  schemes 
which  are  impossible  to  fabricate  with  conventional  refractive/optical  tech¬ 
nology.  However,  it  requires  space  because  the  HOE  must  be  located  above 
the  arrays  in  the  optical  module.  In  the  guided-wave  type,  an  optical  signal 
is  transmitted  and  distributed  from  a  coherent  light  source  to  an  array  of 
optical  detectors  via  a  guided-wave  optical  medium  such  as  optical  fibers 
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or  integrated  waveguides.  This  type  of  holographic  optical  interconnect  is 
more  compact  and  more  mechanically  stable  than  the  free-space  type.  Its 
compactness  is  due  to  the  planar  layout  of  the  optical  elements  and  the  use 
of  small  waveguides.  Its  stability  is  due  to  the  fact  that  all  optical  elements 
are  fixed  on  a  comm.on  substrate.  A  hybrid  integration  procedure  then  en¬ 
ables  integration  of  laser  diodes  and  photodiodes  at  any  position  on  the 
guided-wave  circuit  surface. 

As  an  example  of  massively  parallel  extrinsic  connections  between 
hologram  planes  implemented  by  free  space  optics,  the  information  through¬ 
put  in  one  cycle  of  the  AT&T  Bell  Laboratories'  looped  digital  optical  pipeline 
processor  is  higher  than  that  of  all  the  hardwired  telephone  nets  on  the 
whole  world  together.  The  processor  is  based  on  a  family  of  optical  modules 
which  creates  by  split-and-shift  hologratings  from  a  pair  of  laser  beams  the 
array  of  power  supply  beams  to  read  chips  containing  large  arrays  of  self¬ 
electro-optic  effect  devices  (SEEDs)  of  Sum  square.  It  operates  at  1  million 
cycles  per  second,  slower  than  most  personal  computers,  but  the  most  op¬ 
timistic  perspectives  predict  the  implementation  of  an  all-optical  processor 
operating  at  several  hundred  million  cycles  per  second — faster  than  most 
supercomputers — within  the  next  five  years. 

Remark  1.1.  The  SEED  processing  elements  form  very-low-energy  electo- 
optic  modulators  and  optical  logic  gates  based  on  multiple  quantum  well 
(MQW)  structures  that  are  fabricated  by  gallium  arsenide  GaAs-gallium 
aluminum  arsenide  GaAlAs  technology  utilizing  molecular  beam  epitaxy 
(MBE).  Large  arrays  of  a  family  of  SEEDs  control  the  intensity  of  a  beam  of 
850nm-wavelength  laser  light  passing  through  them  by  making  use  of  the 
quantum  confined  Stark  effect  (QCSE).  This  quantum  phenomenon  causes 
an  electrical  voltage  of  a  few  volts  applied  normal  to  the  plane  of  the  quantum 
wells  to  decrease  the  material's  ability  to  absorb  light  at  850nm  wavelength. 
Thus  an  electrical  signal  can  be  converted  to  an  optical  signal  carried  by  a 
laser  beam.  In  a  symmetric  SEED  (S-SEED)  the  quantum-well  material  is 
grown  inside  the  intrinsic  region  of  a  PIN  photodiode  structure,  and  two 
such  diodes  are  connected  in  series  with  a  dc  bias  voltage  applied.  If  laser 
light  is  incident  on  one  of  the  diodes,  a  photocurrent  is  generated,  the  other 
diode  acts  as  an  electrical  load,  and  the  voltage  across  the  first  diode  drops. 
This  voltage  drop  causes  an  increase  in  optical  absorption  via  the  QCSE,  thus 
generating  more  carriers  and  increasing  the  photocurrent.  Positive  feedback 
ensues,  and  the  S-SEED  switches  into  a  stable  state  in  which  the  first  diode 
has  a  maximum  absorption  and  transmits  only  a  little  light,  while  the  second 
diode  has  low  adsorption  and  high  transmission.  If  a  higher  light  intensity  is 
applied  to  the  second  diode,  the  S-SEED  will  flip  into  the  opposite  state  with 
the  first  diode  absorbing  and  the  second  diode  transmitting.  The  logic  state 
of  the  device  depends  only  upon  the  ratio  of  the  input  beams,  and  the  state 
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of  the  device  can  be  read  out  by  a  pair  of  equally  powered  beams  without 
altering  its  condition,  thus  providing  time-sequential  gain.  In  this  way  the  S- 
SEEDs  can  be  used  as  fully  cascadable  optical  differential  logic  devices  with 
low  switching  energies  (currently  subpicojoule)  and  the  potential  speed  of  a 
billion  operations  per  second.  In  fact,  they  are  the  first  optical  logic  gates  that 
are  competitive  with  microelectronic  processing  elements  in  terms  of  switch 
energy  and  cascadability.  Arrays  of  thousands  of  highly  uniform  S-SEEDs 
contained  in  GaAs  chips  can  be  addressed  by  arrays  of  laser  beams  incident 
normal  to  the  plane  of  the  chip.  The  beams  can  set  and  read  out  the  logic 
state  of  the  gates  and  transfer  the  data  to  subsequent  similar  arrays  using 
imaging  optics  183, 130  1311. 

Another  example  of  massively  parallel  extrinsic  connections  between 
hologram  planes  is  the  guided-wave  optical  interconnection  technology  used 
in  the  field  of  amacronics.  Amacronic  structures  are  hybrid  analog  neural 
processors  formed  by  layers  of  optics,  electronics,  and  detector  arrays  orga¬ 
nized  in  a  parallel  way  similar  to  the  amacrine  clustered  processing  layers  in 
front  of  the  human  retina.  Like  diurnal  insects,  amacronic  sensor  technology 
finally  may  be  able  to  dynamically  trade  sensitivity  for  resolution. 

All  the  holographic  optical  interconnect  technology  underlies  the  fun 
damental  fact  that  in  the  quantized  theory  of  the  electromagnetic  field  the 
bosons  (integral-spin  particles)  present  in  a  beam  of  coherent  light  traveling 
in  a  well-defined  direction  are  the  photons.  For  light  quanta,  however,  the 
quantum  parallelism  occurs  according  to  which  different  alternatives  at  the 
quantum  level  are  allowed  to  coexist  in  quantum  complex  linear  superposi¬ 
tion.  The  key  idea  of  quantum  holography  is  to  mathematically  model  the 
quantum  parallelism  by  the  Kirillov  quantization.  This  procedure  allows 
to  identify  in  a  first  step  the  hologram  plane  with  the  three-dimensional 
Heisenberg  nilpotent  Lie  group  quotiented  by  its  one-dimensional  center, 
then  to  restrict  in  a  second  step  the  sesquilinear  holographic  transform 

i|i( t')dt'  :  (p(t  )dt  Hi (iji,  (p;x,y )  •  dx  A  dy 

to  the  holographic  lattices  which  form  two-dimensional  pixel  arrays  inside 
the  hologram  plane,  and  finally  to  recognize  in  a  third  step  the  hologram 
plane  as  a  neural  plane  of  local  neural  networks. 

Quantum  or  photon  holography  as  a  part  of  quantum  optics  or  pho¬ 
tonics  is  the  procedure  of  mathematically  modeling  the  quantum  parallelism 
by  the  Kirillov  quantization.  It  allows  a  unified  approach  to  planar  optical 
components  of  digital  optical  computers  and  analog  amacronic  processors. 
Based  on  the  beam  splitter  quantum  interference  experiment  as  an  elemen¬ 
tary  building  unit,  the  quantum  holographic  appioach  is  also  applicable  to 
the  Soffer  optical  resonator  and  the  optical  processing  of  synthetic  aperture 
radar  (SAR)  data  which  represent  particularly  important  examples  of  optical 
neurocomputer  architectures. 
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Utilizing  a  unified  quantum  holographic  approach  to  artificial  neu¬ 
ral  network  models  implemented  with  coherent  optical,  optoelectronic,  or 
analog  electronic  neurocomputer  architectures,  the  paper  establishes  a  new 
identity  for  the  matching  polynomials  of  complete  bichromatic  graphs  which 
implement  the  intrinsic  connections  between  neurons  of  local  networks  lo¬ 
cated  in  the  neural  plane.  In  microoptics  and  nanotechnology,  the  quantum 
theoretical  treatment  of  optical  holography  is  imperative  because  it  involves 
only  small  differences  of  energy  and  not  because  atoms  coherently  excited 
by  short  laser  pulses  may  be  as  large  as  some  transistors  of  microelectronic 
ICs  and  the  pathways  between  them  inside  the  hybrid  VLSI  neurochips  of 
amacronics.  Actually,  quantum  effects  can  occur  over  distances  of  several 
meters  or  even  billion  of  light  years  for  quasars. 

Until  recently  optical  computing  ivas  looked  upon  as  an  alternative 
technology  for  performing  an  old  task.  Notv,  a  paradigm  shift  is  coming 
about  as  a  result  of  the  realization  that  optical  computers  are  funda¬ 
mentally  different  from,  and  in  many  senses  superior  to,  any  electronic 
computer.  Certain  optical  computers  are  the  only  available  ones  that  are 
intrinsically  quantum  mechanical  processors. 

H.  John  Caulfield  and 
Joseph  Shamir  (1990) 

Not  enough  has  been  xvritten  about  the  philosophical  problems  involved 
in  the  application  of  mathanatics,  and  particularly  of  group  theory, 
to  physics.  On  the  one  hand,  mathanatics  is  created  to  solve  specific 
problans  arising  in  physics,  and,  on  the  other  hand,  it  provides  the 
very  language  in  which  the  laxos  of  physics  are  formulated.  One  need 
only  think  of  calculus  or  of  Fourier  analysis  as  examples  of  this  dual 
relationship. 

We  are  all  familiar  with  the  exploitation  of  symmetry  in  the  solution 
of  a  mathematical  problem.  On  the  other  hand,  the  very  assertion  of 
symmetry  is  often  the  most  profound  formulation  of  a  physical  law  or 
the  key  step  in  the  development  of  a  new  theory. 

V.  Guillemin  and  S.  Sternberg  (1984) 


d..  Introductory  comments 

Real-time  image  analysis  and  processing,  computer  vision,  automatic  target 
recognition  for  intelligent  robots,  remote  surveillance,  autonomous  naviga¬ 
tion,  sound  localization,  speech  processing  and  understanding,  smart  sen¬ 
sors  processing,  and  various  other  application  areas  of  artificial  intelligence 
(AI)  need  to  process  an  immense  amount  of  data  with  very  high  velocity. 
The  computational  power  required  exceeds  by  many  orders  of  magnitude 
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the  capabilities  of  sequential  digital  computers.  The  Space  Station  program's 
Earth  Observing  System  (Eos)  polar  orbiting  platforms,  for  instance,  require 
to  process  data  rates  up  to  1.5  gigabits  per  second.  High  resolution  color 
images  running  at  frame  rates  as  low  as  30  frames  per  second  will  require  1 0® 
bits  per  second.  If  one  adds  some  form  of  autonomous  feature  identification 
to  the  system,  the  processing  requirement  will  be  between  10'°  and  10’® 
operations  per  second.  As  a  final  example,  a  human-like  speech  recognizer 
must  simultaneously  perform  phonetic,  phonemic,  syntactic,  semantic,  and 
pragmatic  analyses  of  its  inputs  and  match  them  to  5  •  1  O'*  words  in  real  time. 
These  processing  throughputs  exceed  even  the  most  optimistic  projections 
for  sequential  supercomputers. 

The  problem  of  large-volume  and  high-speed  computations  can  be 
solved  by 

•  data  compression  techniques, 

■  parallel  data  processing. 

Since  their  very  beginning,  artificial  neural  networks  have  been  con¬ 
sidered  as  massively  parallel  computing  paradigms.  Indeed,  neural  nets 
offer  the  potential  of  providing  massive  parallelism,  adaption  to  dynamic 
data  structures,  and  new  algorithmic  approaches  to  problems  in  image  pro¬ 
cessing,  computer  vision,  speech  recognition,  robotic  control,  knowledge 
processing,  among  other  application  fields  of  Al.  Ever  the  fastest  sequential 
digital  electronic  computers  (including  advanced  parallel  architectures)  typ¬ 
ically  require  processing  times  ranging  from  many  minutes  to  several  hours 
for  non-complex  low-level  image  processing  tasks  on  large  image  arrays. 

The  advantages  of  neural  computation  are  now  widely  recognized  and 
neural  networks  form  one  of  the  most  rapidly  expanding  areas  of  contem¬ 
porary  research.  In  fact,  research  in  neurocomputer  science,  stimulated 
by  major  advancements  in  neurophysiological  studies,  neurosynergetic  un¬ 
derstanding,  optoelectronic  technology,  molecular  engineering,  and  bioelec- 
tronic  material  processing  is  currently  in  the  midst  of  a  gold  rush  period,  an 
intense  period  of  rapid  discovery  and  exploitation.  Everywhere  new  veins 
of  gold  are  being  uncovered  and  mined  by  thousands  of  prospectors,  most 
of  whom  have  crossed  over  into  this  exciting  new  research  area  from  a  diver¬ 
sity  of  other  disciplines — neurobiology,  neurosynergetics,  quantum  physics, 
imaging  optics,  electrical  engineering,  mathematics,  and  computer  science. 

The  fundamental  characteristics  of  all  known  neurocomputer  architec¬ 
tures  are  the  linear  synaptic  interconnections  between  simple  nonlinear  pro¬ 
cessing  elements,  called  neurons,  to  form  a  concurrent  distributed  processing 
pattern  of  extensive  connectivity.  The  processing  units  like  the  S-SEED  logic 
devices  (cf.  Remark  1.1  supra)  are  arranged  as  two-dimensional  arrays  of 
neurons  in  the  neural  plane.  Information  is  stored  in  Uie  neurocomputer 
almost  exclusively  in  the  interconnection  pattern,  called  neural  network. 
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Large  scale  (LS)  collective  systems  like  artificial  neural  networks  exhibit 
many  properties,  including  robustness,  reliability,  and  fault  tolerance,  an 
ability  to  deal  with  ill-posed  problems  and  noisy  data,  which  conventional 
digital  computer  architectures  do  not.  Adopting  the  synergetic  point  of 
view,  neurobiology  provides  existence  theorems  on  effectiveness  of  neural 
network  parallel  algorithms  on  appropriate  problems. 

For  artificial  neural  networks  to  become  ultimately  useful,  neuromor- 
phic  hardware  must  be  developed.  The  effectiveness  of  neuromorphic  hard¬ 
ware  is  in  direct  proportion  to  the  attention  it  pays  to  the  guiding  neuro- 
biological  metaphor.  Development  efforts  in  the  field  of  sixth  generation 
computers  have  concentrated  on  one  of  two  goals:  to  build 

•  efficient  hardware  that  effectively  executes  software  simulations, 

■  actual  hardware  emulators  for  specific  neural  network  models. 

Examples  of  the  first  are  the  Hecht-Nielsen  Neurocomputer  (HNC)  ac¬ 
celerator  board  for  conventional  serial  personal  computers,  and  the  Delta 
board  by  Science  Applications  International  Corporation  (SAIC).  HNC  is 
also  pioneering  a  new  computer  language,  AXON,  that  is  designed  for  pro¬ 
gramming  digital  computers  to  simulate  advanced  neural  networks.  An 
important  application  of  the  SAIC  neural  network  software  simulation  is  the 
detection  of  explosives  in  checked  airline  baggage:  the  luggage  is  bathed  in 
low  energy  (thermal)  neutrons  and  the  gamma  rays  resulting  from  neutron 
absorption  by  atomic  elements  in  the  luggage  are  analyzed.  The  artificial 
neural  network  software  then  searches  for  specific  combinations  of  atomic 
elements  that  characterize  explosives  including  dynamites  and  water  gels. 

Examples  of  the  second  can  be  viewed  hierarchically.  On  the  simplest 
level,  the  information  is  recorded  and  retrieved  from  an  erasable  magneto¬ 
optic  disk  by  optical  techniques.  Higher-level  building-blocks  are  two- 
dimensional  arrays  of  coherent  optical  processors  [2,  3,  4,  5,  7,  126,  127, 
129,  128,  142,  145,  144,  143,  146]  for  the  analog  implementation  of  neural 
network  models  by  holographic  optical  interconnects,  and  neural  network 
analog  VLSI  chips.  For  instance,  the  analog  silicon  models  of  the  orientation- 
selective  retina  for  pattern  recognition  (1, 115, 1 17],  and  the  analog  electronic 
cochlea  for  auditory  localization  [92,  115,  116]  belong  to  this  category.  The 
amacronic  and  the  cochlea  VLSI  neurochips  are  made  with  a  standard  com¬ 
plementary  metal  oxide  semiconductor  (CMOS)  process  [189]. 

Although  the  implementation  of  the  various  neural  network  models 
needs  to  overcome  many  difficult  optical,  optoelectronic,  and  analog  elec¬ 
tronic  design  problems,  their  performance  is  modest  compared  with  the 
powerful  organizing  principles  found  in  biological  neural  wetware.  The  vi¬ 
sual  system  of  a  single  human  being  does  more  image  processing  than  do  the 
entire  world's  supply  of  supercomputers,  and  the  nervous  system  of  even  a 
very  simple  animal  like  the  common  house  fly  (Musca  domestica)  contains 
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computing  paradigms  that  are  orders  of  magnitude  more  effective  than  are 
those  found  in  systems  made  by  humans.  Unlike  conventional  computer 
hardware,  neural  circuitry  is  not  hardwired  or  specified  as  an  explicit  set 
of  point-to-point  connections.  Instead  it  develops  under  the  influence  of  a 
genetic  specification  and  epigenetic  factors,  such  as  electrical  activity,  both 
before  and  after  birth.  How  this  happens  is  in  large  part  unknown. 

Neurobiological  development  processes  are  far  too  complex  to  hope 
that  a  relatively  complete  understanding  of  how  a  perceptual  system  de¬ 
velops  and  functions  will  soon  emerge.  But  we  are  familiar  with  complex 
synthetic  systems  whose  principles  of  neural  organization  can  be  understood 
without  one's  knowing  in  detail  how  the  components  work.  Furthermore, 
the  same  principles  can  be  used  to  build  neurocomputers  in  any  of  several 
different  technologies.  Presently  the  most  advanced  neural  network  ana¬ 
log  CMOS  VLSI  chips  model,  to  a  first  approximation,  the  time-frequency 
domain  processing  of  two  highly  spectacular  biological  neural  systems:  the 
active  echolocating  system  of  the  horseshoe  bats  (Rhinolophidae),  and  the 
passive  auditory  localization  system  of  the  barn  owl  (Tyto  alba)  which  both 
produce  complete  maps  of  the  auditory  space  from  the  time-frequency  cod¬ 
ing  pathways.  Continuing  evolution,  however,  of  hybrid  submicron  op¬ 
toelectronic  technology  combined  with  neurocomputer  science,  the  highly 
promising  and  exciting  new  field  of  studying  how  computations  can  be  car¬ 
ried  out  in  extensive  networks  formed  by  two-dimensional  arrays  of  heavily 
interconnected  simple  processing  elements,  will  create  advanced  neurocom¬ 
puters  within  the  next  decade  which  will  be  able  to  solve  problems  intractable 
for  even  the  largest  conventional  digital  computers. 

This  paper  concentrates  on  a  unified  quantum  holographic  approach 
to  massively  parallel  coherent  optical,  optoelectronic,  and  analog  electronic 
neurocomputer  architectures.  Notice  that  the  borders  between  physical  op¬ 
tics,  electronics,  and  solid  state  physics  are  getting  more  and  more  fuzzy.  The 
primary  assumption  of  quantum  or  photon  holography  is  that  the  energy 
transmitted  by  a  beam  of  coherent  radiation  is  divided  into  discrete  wave 
packets,  or  photons,  much  as  an  electric  current  is  made  up  of  a  flow  of  elec¬ 
trons.  Detailed  analysis  shows  that  the  arrival  times  at  a  photodetector  of 
photons  from  a  classical  coherent  radiation  source  such  as  a  laser,  exhibit  the 
same  Poissonian  statistics  as  does  the  thermionic  emission  of  electrons  from 
the  hot  cathode  of  a  vacuum  tube.  Thus,  the  photocurrent  exhibits  fluctua¬ 
tions  which  resemble  the  shot  noise  of  the  current  in  the  vacuum  tube.  The 
quantum  noise  produced  by  a  photoelectric  detector  is  therefore  an  intrinsic 
property  of  the  radiation  itself,  rather  than  of  the  photon  detector. 

The  concept  of  photon  arises  from  the  quantization  of  the  electromag¬ 
netic  field.  The  spatial  part  of  Maxwell's  equations  in  vacuo 
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(differential  2-form  F  =  B  +  E  A  dt  on  the  Lorentz  space  tH’’,  J  —  exterior 
derivative,  *  =  Hodge  star  operator  on  A^(5H‘'))  and  the  boundary  condi¬ 
tions  determine  the  possible  frequencies  v,  direction  and  spai'al  vvavefront 
profile  of  the  field,  collectively  referred  to  as  the  mode.  The  field  modes  are 
described  by  wave  vectors  t  /  0  in  9^^.  In  spite  of  warnings,  sometimes  the 
suggestion  occurs  that  the  Maxwell  waves  do  form  the  wave  functions  of  the 
photon.  This  suggestion,  however,  is  a  false  idea.  Although  a  single  photon 
can  have  a  definite  position  at  one  time,  it  is  impossible  to  construct  a  posi¬ 
tion  operator  for  the  photon.  Therefore  it  cannot  have  definite  positions  at 
all  times  in  a  specified  time  interval,  i.e.,  it  cannot  have  a  definite  trajectory 
(Bohr's  indeterminacy  principle;  see  (139,  180]).  Since  in  the  presence  of 
sources,  photons  can  be  absorbed  or  emitted,  one  cannot  introduce  a  linear 
Schrddinger  evolution  equation  for  a  single  photon.  In  fact,  the  modes  of  the 
electromagnetic  field  must  be  quantized  and  photons  of  energy  hv  then  oc¬ 
cur  as  elementary  excitations  driven  by  the  quantized  Maxwell  field  modes 
of  frequency  "v  and  label  E. 

In  quantum  optics  or  photonics,  the  quantization  of  the  Maxwell  field 
modes  is  done  by  expressing  the  time  dependent  part  of  Maxwell's  equations 
in  the  form  of  the  equation  of  motion  for  a  classical  harmonic  oscillator  and 
then  replacing  the  classical  harmonic  oscillator  by  its  quantum-mechanical 
counterpart.  In  this  way,  the  electromagnetic  field  is  considered  as  an  as¬ 
semblage  of  driven  harmonic  oscillators.  As  a  consequence,  the  energy  of 
the  radiation  field  is  quantized  and  the  quanta  are  referred  to  as  photons. 
The  state  of  the  field  can  be  expressed  in  terms  of  number  states  ut'  which 
are  states  with  n,f  quanta  occupying  the  mode  E.  These  number  states  are 
eigenstates  of  the  Hamiltonian  of  the  quantum-mechanical  harmonic  oscil¬ 
lator.  However,  thev  are  not  a  realistic  description  of  a  coherent  radiation 
field,  as  emitted  by  a  laser.  One  formal  series  of  number  states,  the  so  called 
coherent  state  i«t  >,  is  used  to  represent  the  coherent  radiation  field  produced 
by  an  ideal  source  such  as  an  ideal  laser  operating  well  abo\'e  threshold  In 
the  Dirac  notation,  the  Glauber  formal  series  expansion  [134] 
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describes  a  state  (wf )  where  the  probability  of  finding  the  mode  occupied  by 
Uf  photons  exhibits  a  Poisson  distribution  about  a  mean  of  lotpl^  with  a  width 
|ap|.  Ignoring  the  slowly  time  varying  phase  diffusion,  the  coherent  state  is 
considered  to  be  a  good  approximation  to  the  field  produced  by  a  coherent 
radiation  source  as  a  laser. 

A  mathematical  description  of  photons  or,  more  generally,  of  bosons 
is  given  by  the  Bargmann-Fock  model  of  quantics  (23,  78[.  Actually,  the 
Bargmann-Fock  model  is  based  on  the  quantum-field-theory  annihilation 
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and  creation  operators  for  bosons  and  therefore  c^n  harmonic  analysis  of 
the  three-dimensional  isotropic  Heisenberg  nilpotent  Lie  group  G.  A  first 
important  advantage  of  the  Heisenberg  group  approach  is  that  it  allows  to 
define  two-time  averages,  that  is  to  say,  mean  values  of  mixed  boson  states 
taken  at  two  different  times  t  and  t'  =  t  —  x  of  fixed  difference  t  ~  t'  =  x. 
A  second  important  advantage  is  that  the  Kirillov  quantization  reveals  the 
planar  geometry  of  the  unitary  dual  of  the  Heisenberg  group  G,  that  is  to 
say,  the  planarly  multilayered  structure  of  the  unitary  dual  manifold.  To 
each  flat  layer,  the  Kirillov  correspondence  associates  representations  of  G 
of  linear  Schrodinger  type,  or  representations  of  G  of  linear  Fraunhofer 
type,  respectively.  Because  in  the  quantized  theory  of  the  electromagnetic 
field  the  bosons  present  in  a  beam  of  coherent  light  traveling  in  a  well- 
defined  direction  are  the  photons,  the  Kirillov  quantization  approach  allows 
to  model,  among  other  basic  optical  phenomena,  the  quantum  parallelism 
performed  by  the  beam  splitter  quantum  interference  experiment  and  optical 
holography,  the  functionality  of  holographic  optical  interconnects,  optical 
phase  conjugation,  three-dimensional  planarly  multilayered  optical  de\'ices 
like  display  holograms,  and  spatial  light  modulators  (SLMs). 

Display  holograms  are  probably  one  of  the  most  impressive  realizations 
of  the  unitary  dual  manifold  of  G.  Starting  with  the  beam  splitter  quantum 
interference  experiment  as  an  elementary  building  unit,  the  key  idea  of  quan¬ 
tum  holography  is  to  identify  the  symplectic  hologram  plane  iK  :  tH  with 
G  quotiented  by  its  one-dimensional  center  Co  and  to  recognize  via  the 
Kirillov  quantization  procedure  the  symplectic  hologram  plane  fH  fit  as  a 
neural  plane  of  local  neural  networks.  As  a  result,  harmonic  analysis  on  the 
central  projection  G-slice  G/Cc  provides  filtered  backpropagation  formulae 
which  are  at  the  base  of  the  holographic  reciprocity  principle.  Moreover,  it 
gives  rise  to  the  elementary  holograms  and  the  Gabor  wavelets  which  form 
total  families  of  approximating  functions  in  I  ‘ItH  •  fH]  of  decorrelating  and 
correlating  code  primitives  of  artificial  neural  networks.  The  neural  net¬ 
work  implemented  in  the  symplectic  hologram  plane  f*!  :  explains  the 

robust  "associative"  optical  memory  realized  by  optical  holograms  by  the 
distributed  nature  of  holographic  recordings.  Finally,  a  series  of  new  identi¬ 
ties  for  theta-null  values  which  arises  from  artificial  neural  network  identities 
shows  that  studies  in  computational  mathematics  combined  with  synthetic 
neurobiology  may  have  an  unexpected  spin-off  in  pure  mathematics. 

Emphasis  throughout  the  paper  is  placed  on  the  application  of  quantum 
holography  to  neural  computer  architectures.  For  the  fairly  deep  details  of 
the  Mackey  machinery  and  the  Kirillov  quantization  procedure  underlying 
the  harmonic  analysis  of  the  half-line  bundle  G  over  the  two-sided  symplec¬ 
tic  hologram  plane  fB  fH,  the  reader  is  referred  to  the  monograph  [155], 
Technological  details  of  the  implementations  are  described  in  the  references 
indicated  below  and  in  the  references  listed  in  the  monograph  [52]. 
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The  double-slit  experiment  is  a  phenomenon  which  has  in  it  the  heart 
of  quantum-mechanics;  in  reality  it  contains  the  only  mystery  of  the 
theory. 

Richard  P.  Feynman  (1918-1988) 

In  any  attempt  of  a  pictorial  representation  of  the  behaviour  of  the  photon 
we  ivould  meet  zvith  the  difficulty:  to  be  obliged  to  say,  on  the  one  hand, 
that  the  photon  always  chooses  one  of  the  two  ways  and,  on  the  other 
hand,  that  it  behaves  as  if  it  had  passed  both  ways. 

Niels  Bohr  (1885-1962) 


The  unitary  Schrodinger  evolution  is  totally  deterministic,  maintains 
quantum  complex  superposition,  and  acts  in  a  continuous  way,  but 
the  completely  different  procedure  of  forming  the  squared  moduli  of 
quantum  amplitudes  and  only  this  non-deterministic  reduction  of  the 
state-vector  (or,  as  it  is  sometimes  graphically  described:  collapse  of  the 
zvavefunction)  introduces  uncertainties  and  probabilities  into  quantum 
theory.  It  is  a  probabilistic  law  zohich  grossly  violates  quantum  complex 
superposition  and  is  blatantly  discontinuous. 

Roger  Penrose  (1989) 


3.  Quantum  holography 

The  vertebrate  vision  system  is  perhaps  the  most  complex  neural  assembly 
known.  Although  more  details  are  known  about  vision  than  about  any  other 
neural  system,  it  is  by  far  not  yet  fully  understood.  On  the  deepest  level  of 
molecular  operation,  visual  imaging  is  a  quantum  p-ocess.  A  solid  object 
i.«  seen  because  light  scattered  by  the  object  causes  chemical  changes  in  the 
retinal  cells  of  the  eye.  The  eye  is  quite  a  good  light  detector:  Experiments 
have  shown  that  the  vertebrate's  retinal  rod  photoreceptors  can  respond  to 
the  absorption  of  even  a  single  photon  [14].  In  general,  however,  many 
photons  are  absorbed  by  the  eye  without  reaching  a  light  sensitive  cell. 
For  this  reason  only  a  few  photons  in  every  hundred  that  enter  the  eye 
are  detected.  Obviously  the  chemical  changes  involved  in  seeing  an  object 
must  be  reversible.  In  fact,  the  cell  reverts  to  its  normal  state  after  about 
one-tenth  of  a  second.  It  is  this  short  light  storage  period  that  limits  the 
sensitivity  of  the  eye  for  detecting  faint  objects.  Photography  can  overcome 
this  limitation  of  the  eye  by  storing  the  changes  in  a  permanent  way  on 
photographic  emulsion. 

Photographic  emulsion  consists  of  individual  grains  of  a  silver  halide 
compound,  in  which  the  silver  atoms  are  ionized.  When  a  photon  is  absorbed 
by  the  photographic  emulsion,  an  electron  is  emitted,  in  the  same  way  as 
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electrons  are  knocked  out  of  a  metallic  surface  in  the  photo-electric  effect. 
This  electron  can  now  be  attracted  by  a  silver  ion  to  form  a  neutral  atom  of 
silver.  Left  to  itself,  the  neutral  silver  atom,  surrounded  by  the  ionic  silver 
halide  compound,  is  unstable,  and  will  eventually  eject  the  electron  and 
revert  back  to  an  ion.  However,  if  before  this  happens,  other  photons  have 
produced  several  other  neutral  silver  atoms  nearby,  a  stable  development 
center  consisting  of  a  small  number  of  atoms  can  be  formed.  In  contrast, 
each  grain  of  the  photographic  emulsion  contains  billions  of  silver  ions. 
However,  when  the  photographic  emulsion  is  developed,  this  assembly  of 
neutral  silver  atoms  induces  all  the  remaining  silver  ions  in  the  grain  to  be 
deposited  as  opaque  metallic  si'  -er. 


Figure  3.1 

Figure  3  1  shows  several  pht)tographs  of  the  same  person  taken  at 
different  exposures.  In  the  top  left  picture  about  3  lO’  photons  enter  the 
camera.  Most  of  these  photons  are  absorbed  without  causing  permanent 
change  in  the  photographic  emulsion.  It  is  evident  that  3  10^  photons  are 
not  enough  to  generate  a  recognizable  image  and  the  photograph  appears 
like  random  clusters  of  light  dots.  However,  when  the  exposure  increases 
the  number  of  photons  entering  the  camera  increases.  The  top  right  picture 
involves  about  10''  photons  and  already,  although  there  is  no  clear  image,  a 
blurred  impression  of  an  image  is  beginning  to  show  up  within  the  clusters  of 
light  dots.  The  improvement  continues  as  the  number  of  photons  increases. 


Quaiitulu  holography  and  ncurocompiitcr  (irt/iifrcfKrt’s  } 


{  397 

and  the  final  exposure  involves  more  than  3- 10^  photons.  In  this  last  picture, 
the  image  intensity  seems  to  vary  smoothly  although  it  is  built  up  out  of 
individual  development  centers  created  by  the  arrival  of  individual  photons. 
Although  in  the  lowest  exposure  photograph  the  positions  of  the  bright  dots, 
signifying  the  presence  of  a  development  center  in  that  grain  emulsion,  seem 
to  be  random,  they  are  not.  Centers  are  more  likely  to  develop  in.  places 
where  the  image  will  eventually  be  bright.  Thus,  even  in  a  photograph,  the 
quantum  theoretical,  probabilistic  namre  of  light  detection  can  be  seen.  It  is 
not  possible  to  predict  with  certainty  where  any  particular  photon  will  land, 
or  in  which  gram  of  the  photographic  emulsion  a  development  center  will 
be  produced. 

Photographic  emulsions  are  not  sensitive  to  individual  photons.  Sev¬ 
eral  neutral  atonas  must  be  produced  in  the  photographic  detecloi  to  iorin 
a  development  center.  More  efficient  tor  imaging  applications  than  pho¬ 
tographic  plates  are  the  CCD  (charge  coupled  device)  detectors  They  are 
formed  by  a  two-dimensional  arrav  of  photon  detectors  laid  out  on  a  single 
silicon  chip  typically  comprising  2'’  •  1''  pixels  of  size  20  una  •  20  um.  In  fact, 
there  are  many  formats  for  CCDs  available  The  arrival  of  single  photons 
is  detected  directly  by  CCDs  by  converting  them  to  electron-hole  pairs,  and 
then  collecting  the  electrons  iiato  a  potential  well  created  by  a  depletion  re¬ 
gion.  The  accumulated  charge  at  eaclt  position  over  the  detector  arrav  then 
corresponds  to  the  pattern  of  photons  striking  the  CCD.  The  ciiarge  is  read 
out  by  clocking  the  potential  so  that  buckets  ot  electrons  transfer  froni  '.s  ell 
to  well  until  they  reach  an  integrating  capacity  and  an  on-chip  preampli¬ 
fier.  An  important  feature  of  the  devise  is  that  each  pixel  can  be  separateK- 
addressed,  making  CCDs  extremely  powerlu)  for  imaging  applications. 

Whereas  photography  first  processes  the  optical  informati.'n  to  form 
an  image  which  is  hen  recorded  on  the  emulsic>n  ot  a  photugrapliic  plate 
or  a  CCD  detector  array,  it  is  also  possible  to  record  ti'.e  raw  optical  data  in 
a  non-localized  way  on  the  piiotographic  emulsion  or  CC!')  and  then,  place 
the  processing  in  the  future  with  the  viewer  The  metho-f  for  recording 
the  complete  raw  optical  data  is  called  optical  holography.  It  is  a  two-step 
processing  method  which  involves  the  phenomena  of 

•  scattering, 

•  stationary  ine.rf,,.-.....— . 

•  diffraction. 

In  the  first  processing  step,  the  holographic  image  encoding  procedure, 
the  beams  scattered  by  the  solid  object  are  mixed  with  the  coherent  reference 
beam  and  the  generated  stationary  quantum  interference  pattern  is  recorded 
as  an  optical  hologram.  In  the  second  processing  step,  the  hokrgraphic  image 
decoding  procedure,  the  quantum  interference  pattern  is  read  out. 
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The  quantum  interference  patterns  are  not  stationary  unless  the  inter¬ 
fering  light  wavelets  are  coherent.  There  are  two  types  of  coherence,  both 
of  which  are  required  at  least  to  some  extent,  to  get  a  stationary  interference 
fringe  pattern.  One  is  temporal  (longitudinal  or  axial)  coherence,  which  is 
the  requirement  that  the  light  wavelets  all  travel  in  the  same  time  or  same 
frequency,  i.e.,  monochromatic  light.  The  other  is  spatial  (transverse  or  lat¬ 
eral)  coherence,  which  is  the  requirement  that  the  light  wavelets  are  moving 
together  in  phase  as  if  they  started  from  a  single  point  in  space.  The  laser 
produces  a  high  degree  of  both  temporal  and  spatial  coherence. 
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Figure  3.2 

The  basic  experimental  set-up  to  generate  optical  holograms  is  a  modi¬ 
fied  version  ot  the  archetypical  double-slit  c]uantum  interference  experiment 
by  which  Thomas  ^bung  in  1803  conclusively  verified  the  wave  character 
of  monochromatic  light,  and  Ci.I’  Thom.son,  the  son  of  J  I.  Thomson  who 
first  demonstrated  that  electrons  behave  like  particles,  and  also  Davisson 
and  fiermer  in  1027  conclusively  revealed  that  electrons  also  behave  like 
wax  es;  downstriam,  the  primary  w'-  e  is  divided  by  a  beam  splitter  into 
two  coherent  wavelets  which  tr.icei  different  paths  before  recombination 
and  detection;  see  F'igure  .3.2.  It  in  the  beam  splitter  quantum  interference 
experiment  the  two  photon  routes  possible  in  the  linear  Mach-Zehnder  in¬ 
terferometer  are  exactly  equal  in  length,  there  is  lOOVr  probability  that  the 
photon  reaches  the  detector  A  and  a  07(  probability  that  it  reaches  the  other 
detector  B,  In  other  words:  the  photon  is  certain  to  strike  the  detector  A.  If 
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an  absorbing  screen  is  placed  in  the  way  of  either  of  the  two  interferometer 
routes,  then  it  becomes  equally  probable  that  the  photon  reaches  A  or  B;  but 
when  both  paths  are  open  and  of  the  same  lengths  only  detector  A  can  be 
reached.  Blocking  off  one  of  the  two  routes  allows  B  to  be  reached.  Therefore 
the  photon  must  have  actually  traveled  both  routes  at  once.  If  a  scattering 
object  is  placed  in  the  way  either  of  the  two  interferometer  routes,  a  sta¬ 
tionary  quantum  interference  pattern  is  generated  in  the  hologram  plane 
due  to  the  incremental  measuring  principle  of  the  linear  Mach-Zehnder  in¬ 
terferometer.  The  organization  of  random  clusters  to  stationary  quantum 
interference  fringe  patterns  in  the  hologram  plane  can  be  visualized  by  the 
photon-counting  image  acquisition  system  (PIAS-Tl)  which  is  capable  to 
detect  even  single  photons  like  the  vertebrate's  retinal  rod  and  to  obtain 
images  at  a  level  of  darkness  not  obtainable  even  by  ultra-high  sensitivity 
video  cameras. 

Theorem  3.1.  The  holographic  image  encoding  procedure  is  formed  by 
a  complex  linear  superposition  of  beam  splitter  quantum  interference  ex¬ 
periments.  Conversely,  the  beam  splitter  quantum  interference  experiment 
performed  by  a  linear  Mach-Zehnder  interferometer  forms  a  degenerate 
holographic  image  encoding  procedure. 

From  this  result  it  follows  that  the  basic  quantum  phenomenon  of  op¬ 
tical  holography  is  the  quantum  parallelism  according  to  which  different 
alternatives  at  the  photon  level  are  allowed  to  coexist  in  quantum  complex 
linear  superposition.  The  great  thirty-year  dialogue  between  Bohr  and  Fin- 
stein  [190,  191,  192,  194,  193,  195,  177]  concerning  the  issues  of  the  beam 
splitter  quantum  interference  experiment  demonstrates  the  fundamental 
importance  of  the  basic  holographic  image  encoding  procedure  (see  The¬ 
orem  11.3  infra). 

Die kodiertc Form  der  Amplituden-iind  Phasenverteilun^  trdi^t  die  Beze- 
ichnung  Hologramm.  Im  Grunde genonimen  sicllt  das  Hologramm  ein 
Interferenzmuster  dar.  dap  durch  die  libcrlagerung  der  vom  Ohjekt 
gestreuten  Wellen  mil  der  Referenzxoelle  zusiande  kommt.  Die  Funk- 
tion  der  Referenzxoelle  kann  man  sichaiich  so  I'erdeutlkhen,  dap  durch 
sie  eine  Lichtwelle  im  Raum  "eiiigefroreii"  wird.  Fs  addieren  sich 
die  Amplituden  iinter  Beriicksichtiguiig  direr  Phasenbeziehungen  und 
nicht  die  Intensitdten. 

Jurij  /.  Ostrovskij  (1988) 

The  zuiriotis  descriptions,  Doppler  filtering,  aperture  synthesis,  hologra¬ 
phy, and  cross-correlation,  diverse  as  they  are  when  described  physically, 
become  identically  when  formulated  mathematically. 

Emmett  N.  Leith  (1978) 
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The  fundamental  fact  about  optical  holograms  is  that  they  are  light  diffract¬ 
ing  elements.  As  planar  diffractive  optical  elements  (DOEs)  they  are  of 
central  importance  for  the  implementation  of  neurocomputer  architectures 
by  holographic  optical  interconnects.  The  starting  point  of  the  holographic 
principle  is  the  important  fact  that  all  detectors  are  phase-blind  at  the  fre¬ 
quency  range  ot  the  visible  light.  To  overcome  the  phase-blindness,  optical 
holography  encodes  in  the  writing  step  the  phase  information  of  the  optical 
signals  by  a  geometric  encoding  procedure.  The  word  holography  comes 
from  the  Greek  holos  meaning  entire,  complete  and  graphein  meaning  to 
write,  to  record.  Subsequently,  optical  holographic  technique  decodes  in  the 
readout  step  the  phase  information  by  light  diffraction.  The  holographic 
reciprocity  principle  mathematically  describes  the  decoding  procedure. 

In  order  to  get  mathematical  insight  into  the  geometric  encoding  proce¬ 
dure  of  optical  holography  by  sesquilinearization  of  the  multiplexed  signal 
energy,  let  S('J^ )  denote  the  Schwartz  space  of  complex-valued  6^  wavelet 
packet  amplitude  densities  on  the  real  line  rapidly  decreasing  at  infinity. 
Consider  §(fH  1  as  a  dense  vector  subspace  of  the  complex  Hilbert  space  L"  (iH ) 
of  square  integrable  complex-valued  densities  with  respect  to  Lebesgue  mea¬ 
sure  dt  of  fh  under  its  natural  isometric  embedding.  Endow  Slt'H)  with  the 
standard  scalar  product  (  .  I  . )  and  the  associated  total  signal  energy  norm 
II  .  Hi.  In  crptical  holography,  a  square-law  detector  encodes  in  a  massively 
parallel  way  the  optical  path  length  difference  x  €  and  the  phase  differ¬ 
ence  y  €  fH  of  two  coherent  signals  having  the  same  center  frequency  v  0. 
Assuming  that  the  writing  complex-valued  wavelet  packet  amplitude  densi¬ 
ties  ijj(t')dt' and  cp(t  Idt  with  respect  to  Lebesgue  measure  dt'  -  dt  belong  to 
the  space  ),  the  coordinates  (x,y)  of  the  stationary  quantum  interference 
pattern  are  simultaneously  recorded  in  the  hologram  plane  ih  by  the  co¬ 
herent  two-wavelet  mixing  uqt'  )dt' : ;  (p(t)dt.  In  view  of  the  phase-conjugate 
cross  terms  or  interference  terms  {ri»|<p)  and  \V|»|  ip)  of  the  total  signal  energy 
distribution  identity  or  signal  intensity  relation 

]|vi|)  -(-  -  (x'4’ t-  "’H"’  I  *  "’‘P) 

with  complex  weights  \  ,vv  c  (T,  the  sesquilinear  extension  toSI'Jl)  S(fH)  of 
the  mapping 


\\)  cp  H >  Hv(vl>, cp;x,y )  - 


\l'(t  -  x)(plt)c 


dt 


(X'  0) 


describes  by  quantum  complex  linear  superposition  of  the  phased  two-time 
average  the  first  step  of  the  angle  image  encoding  procedure  of  optical  holo¬ 
graphy.  In  this  first  processing  step,  each  object  to  be  globally  stored  by  the 
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coherent  object  signal  beam  is  encoded  in  the  hologram  prior  to  its  recording 
by  mixing  (or  multiplexing  or  heterodyning)  an  unfocused  linearly  polar¬ 
ized  coherent  non-object-bearing  reference  signal  beam  having  a  particular 
angle  between  its  wave  vector  and  the  normal  vector  of  the  hologram  plane 
(H  -•  iH-  Therefore  the  sesquilinear  mapping  defined  by  the  assignment 

il)(t')dt'  :■  (p(t)dt  >-4  Hi  (tj), <p; x,y )  ■  dx  A  dy 


is  called  the  holographic  transform  of  the  complex-valued  writing  wavelet 
packet  amplitude  densities  [162,  150,  151].  It  should  be  observed  that  un¬ 
like  sequential  data  processing,  the  holographic  transform  of  mixed  wavelet 
packet  amplitude  densities  as  indicated  above  does  not  treat  time  as  a  se¬ 
quencer  but  as  an  expresser  of  information  similarly  to  biological  neural 
systems  where  time  is  used  throughout  as  one  of  the  fundamental  repre¬ 
sentational  coordinates.  Moreover,  it  should  be  noticed  that  the  coordinates 
(x,  y )  of  the  stationary  quantum  interference  pattern  are  independent  of  the 
distance  between  the  object  to  be  globally  recorded  and  the  square-law  de¬ 
tector  located  in  the  hologram  plane.  It  follows  by  quantum  theoretical 
state-vector  reduction  which  is  a  nonlinear  procedure: 

Theorem  4.1.  Let  tj)  and  cp  be  wavelets  in  SlfHI  of  unit  energy  Ijii'il  2  -  - 

1.  Then  the  phased  two-time  average 

|H|(4),(p;x.y)| 

which  records  the  stationary  quantum  interference  pattern  generated  by  the 
coherent  two-wavelet  mixing  v|'(t')dt'  (p(t)dt  provide.s  the  probability  of 
detecting  a  photon  within  a  unit  area  attached  to  the  point  (x.y)  of  the 
hologram  plane  '21  i  51. 

The  method  of  optical  holography  or  coherent  wavefront  reconstruc¬ 
tion  applies  to  all  waves:  to  electron  waves.  X-rays,  light  waves,  acoustic 
waves,  and  seismic  waves,  providing  the  wavelets  are  coherent  enough  to 
form  the  required  stationary  quantum  interference  patterns  in  the  hologram 
plane  [161].  Therefore  a  laser  is  not  really  needed  for  optical  holography;  it  is 
merely  the  use  of  solid,  three-dimensional  objects  that  calls  for  light  wavelets 
whose  coherence  length  exceeds  the  path  differences  due  to  the  unevenness 
of  such  objects.  Dennis  Gabor  used  in  1947  a  filtered  mercury  arc  lamp  to 
get  temporal  coherence  and  a  pinhole  to  get  spatial  coherence  in  the  first  op¬ 
tical  hologram.  The  main  reason  for  the  discouraging  quality  of  his  optical 
holograms  was  that  no  light  source  existed  at  that  time  with  the  combined 
intensity  and  coherence  that  was  needed.  When  laser  light  became  avail¬ 
able,  the  quantum  interference  experiments  by  Emmett  N.  Leith  and  juris 
Upatnieks  at  the  University  of  Michigan  in  1962  resulted  in  excellent  optical 
display  holograms  that  astonished  the  scientific  community. 
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In  radar  analysis,  the  mapping  (x,y)  •->  Hv(4i),q>;x,y)  is  called  the 
narrowband  cross-ambiguity  function  [16,48, 94, 186, 122, 125, 153, 155]  and 
is  used  to  characterize  the  resolution  performance  of  radar  signals.  In  the 
following  it  will  be  convenient  to  define  the  narrowband  auto-ambiguity 
function  by  Hvfxh.H';  •.  •)•  The  mapping 

i|i(t)dt  Hi  {\i);x,y  ]  •  dx  Ady 

which  describes  the  self-interference  of  photons  is  called  the  holographic 
trace  transform.  In  view  of  Theorems  3.1  and  4.1  supra,  the  holographic 
trace  transform  models  the  classical  beam-splitter  quantum  interference 
experiment. 

Remark  4.2.  The  only  examples  of  strictly  convex  objects  for  which  the 
scattering  amplitude  density  has  been  analyzed  fairly  completely  for  high 
frequencies  (1^1  oo)  are  the  compact  spheres  of  the  Euclidean  space  fH’. 
According  to  the  synergetic  point  of  view,  however,  optical  holography  does 
not  attempt  to  mathematically  predict  the  scattering  amplitude  densities  but 
geometrically  encodes  and  decodes  the  scattering  amplitude  densities  and 
their  phases  as  an  experimental  result  utilizing  coherent  reference  beams. 

Remark  4.3.  A  vital  element  of  optical  neurocomputer  architectures  is  the 
medium  for  optical  hologram  recording  because  it  plays  the  role  of  an  opti¬ 
cal  holographic  associative  memory.  An  associative  memory  has  the  basic 
capability  of  storing  a  number  of  associated  information  patterns  (u,v),  so 
that  subsequent  presentation  of  one  pattern  u  recalls  its  paired  pattern  v. 
This  is  an  inherently  parallel  procedure,  and  the  attractiveness  of  optical 
implementations  of  holographic  associative  memories  has  been  recognized 
for  some  time.  Electro-optical  photorefractive  crystals  (PRCs)  are  known 
to  form  reusable  optical  holographic  storage  materials  that  can  be  infinitely 
recycled  and  do  not  require  additional  processing.  The  crystals  of  the  sillen- 
ite  family,  bismuth  silicon  oxide  Bii2SiO20  (BSO),  bismuth  titanium  oxide 
Bit2TiO20  (BTO),  and  bismuth  germanium  oxide  Bii2GeO20  (BGO)  exhibit 
the  highest  sensitivity  to  light  among  presently  known  PRCs  [178].  Opti¬ 
cal  holograms  are  recorded  inside  PRCs  directly  by  illuminating  the  crystal 
with  laser  light.  The  light  induces  a  charge  redistribution  inside  the  crystal 
[49,  50]  and  in  a  certain  characteristic  lime  interval  a  dynamic  equilibrium 
between  distributions  of  the  recording  light  intensity  and  internal  electric 
charge  is  established.  The  electric  charge  induces  an  internal  electrostatic 
field  that  changes  the  refractive  index  of  the  crystal  by  the  electro-optical 
(Pockels  or  Kerr)  effect  and  forms  a  volume  holographic  optical  element 
(VHOE).  As  the  interference  pattern  undergoes  changes,  a  new  charge  dis¬ 
tribution  is  formed,  hence  a  new  optical  hologram  is  recorded.  This  charge 
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distribution  again  comes  to  a  dynamic  equilibrium  with  the  recording  quan¬ 
tum  interference  pattern.  If  the  period  during  which  the  interference  pattern 
changes  is  sufficiently  long,  the  electro-optical  crystal  rerecords  an  optical 
hologram.  Hence  the  electro-optical  PRCs  can  adapt  itself  to  varying  exter¬ 
nal  conditions,  such  as  occasional  temperature-induced  changes  of  the  phase 
difference  between  the  writing  object  signal  beam  and  reference  signal  beam, 
or  mechanical  instabilities.  This  is  an  extremely  important  feature  because  it 
allows  more  reliable  storage  of  scattering  objects  by  almost-real-time  quan¬ 
tum  holography. 

Research  in  the  area  of  real-time  quantum  holography  in  electro-optical 
PRCs  needs  to  focus  on  materials  in  order  to  achieve  a  faster  speed  of  photo¬ 
response  (<  1  msec),  greater  photorefractive  sensitivity,  control  over  image 
decay,  and  reduced  fanning.  Molecular  engineering  recently  developed 
the  highly  interesting  and  promising  organic  crystals.  As  an  alternative, 
bioelectronics  or  molecular  electronics  are  oifering  photochemically  sensi¬ 
tive  materials  like  biopolymers  of  the  chlorophyll-protein  complex  and  the 
retinal-protein  complex  for  real-time  holographic  recording.  It  has  been 
discovered  that  specifically  bacteriorhodopsin  which  belongs  to  the  retinal- 
protein  complex  and  which  can  be  extracted  from  the  purple  membrane  of 
Halobacterium  halobium  is  a  very  attractive  recording  material  for  real-time 
optical  signal  processing.  Depending  on  the  preparation  procedure,  these 
materials  have  a  very  wide  range  of  photoresponse  time  running  from  100 
sec  down  to  10  psec,  and  an  extremely  high  spatial  resolution  limited  by 
the  dimensions  of  the  molecules.  However,  research  in  this  area  is  still  in 
the  early  stage  of  development  and  for  the  present  the  studies  are  far  from 
the  practical  implementation  of  potential  biological  neurocomputers.  Nev¬ 
ertheless,  investigations  of  the  simplest  optical  processors  and  of  associative 
memory  elements  based  on  biopolymers  are  being  intensively  developed  in 
various  laboratories  all  over  the  world  so  that  it  is  expected  that  on  the  basis 
of  purple  membranes  of  halobacteria  an  optical  memory  with  a  capacity  of 
10’  bits/cm^  will  be  created  in  near  future  [148). 

Remark  4.4.  According  to  the  rules  of  quantum  theory,  any  two  states  what¬ 
ever,  irrespective  of  how  different  from  one  another  they  might  be,  can  coexist 
in  any  quantum  complex  linear  superposition.  This  is  the  deep  and  deci¬ 
sive  reason  for  the  fact  that  high-resolution  radar  imagery  of  the  terrain  and 
optical  holographic  imaging  are  closely  related  concepts.  In  fact,  airborne 
and  spaceborne  SAR  imaging  systems  are  active  remote  sensing  systems 
which  illuminate  the  terrain  with  electromagnetic  energy  at  relatively  long 
wavelengths  (radar  L-band  center  wavelength  A  =  23  cm,  C-band  center 
wavelength  A  =  5.7cm,  X-band  center  wavelength  A  =  3.1  cm)  as  the  plat¬ 
form  moves  with  respect  to  the  ground  being  mapped.  SAR  imaging  systems 
coherently  detect  the  signals  returning  from  the  terrain  (called  radar  return) 
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in  order  to  store  them  in  amplitude  density  including  phase  until  all  of  the 
returns  are  collected.  Simultaneous  amplitude  density  and  phase  recording 
is  performed  by  multiplexing  or  heterodyning  the  radar  returns  with  a  refer¬ 
ence  signal  in  order  to  generate  microwave  holograms  [55,  93].  The  signals 
that  are  collected  and  coherently  superposed  in  SAR  systems  by  small  an¬ 
tennas  are  not  already  focused,  as  is  the  case  in  real  aperture  systems  like 
radar  altimeters.  Because  extensive  processing  is  required  to  form  the  SAR 
image  from  the  radar  return,  optical  signal  processing  techniques  have  been 
applied  to  the  collection  and  processing  of  SAR  data.  Chronologically,  the 
coherent  optical  systems  developed  at  the  University  of  Michigan  regarding 
applications  to  the  processing  of  SAR  data  form  the  oldest  branch  in  the 
family  tree  of  optical  computing  [32,  33,  62,  90,  94].  SAR  coherent  imag¬ 
ing  systems  can  be  regarded  as  optical  neurocomputers  which  implement  a 
Doppler  filter  bank  by  a  relatively  static  reflection  pattern  of  the  architecture 
mirror  [26,  88].  The  two-dimensional  quantum  parallelism  inherent  to  the 
optical  data  processing  approach  is  in  large  part  responsible  for  the  success 
of  SAR  coherent  imaging. 

Remark  4.5.  Since  the  advent  of  optical  holography  there  has  been  a  strong 
interest  in  replacing  lenses  and  other  crucial  parts  of  optical  systems  by  high 
performance  HOEs.  In  particular,  optical  SAR  data  processing  systems  may 
be  realized  by  optical  heads  which  include  high  performance  hololenses. 
Many  HOEs  are  fabricated  by  recording  the  stationary  quantum  interfer¬ 
ence  pattern  between  two  mixing  laser  beams.  The  use  of  digital  computer¬ 
generated  hologram  (CGH)  techniques,  however,  avoids  the  technological 
difficulties  involved  in  the  interferometric  HOE  fabrication.  Moreover,  one 
benefit  that  digital  CGHs  can  offer  that  is  not  available  with  optical  holo¬ 
graphy  is  the  ability  to  deal  with  objects  that  exist  only  mathematically. 
Finally,  high  quality  digital  CGHs  to  implement  holographic  optical  inter¬ 
connects  of  high  circuit  density  may  be  fabricated  with  the  same  technology 
used  in  the  manufacture  of  CMOS  VLSI  circuit  chips  [79, 81,  82,  87,  89, 124, 
175,  176].  The  geometrical  CGH  encoding  computations  for  specific  HOE 
pattern  parameters  are  performed  with  a  standard  computer  aided  design 
(CAD)  station.  Upon  completion  of  the  HOE  pattern  database  generation 
and  conversion  of  the  pattern  by  a  subroutine  to  the  required  formatted 
data,  a  digital  computer  controlled  output  device  such  as  a  Perkin-Elmer 
electron-beam  high-resolution  micro-lithographic  system  then  writes  the  de¬ 
sired  geometric  pattern  on  photoresist,  which  is  subsequently  processed  to 
produce  the  finished  transmissive  or  reflective  holographic  element.  It  is  at 
this  intermediate  level  of  lower  throughput  requirements  where  sequential 
processors  play  a  role  in  vision  and  image  processing.  Alternately,  digital 
CGHs  may  be  realized  by  writing  the  appropriate  geometrical  pattern  on 
a  SLM.  In  any  case,  digital  CGHs  are  at  the  base  of  a  technology  trans- 
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fer  from  microelectronics  to  microoptics  or  amacronics  and  form  a  bridge 
between  digital  computer  and  optical  neurocomputer  architectures.  Since 
atoms  coherently  excited  by  short  laser  pulses  (Rydberg  atoms)  may  be  as 
large  as  some  transistors  of  microelectronic  ICs  and  the  pathways  between 
them  inside  the  CMOS  VLSI  chips  [10,  202,  203],  the  quantum  theoretical 
treatment  of  optical  holography  is  of  particular  importance  for  amacronics 
(see  Section  10  infra)  and  nanotechnology. 

Investigators  in  one  field  may  very  well  never  have  been  aware  that  the 
Heisenberg  group  had  been  found  in  some  field  not  seemingly  related  to 
theirs.  Another  factor  certainly  contributory  to  its  relative  obscurity  is 
that  ivhat  I  call  “the  Heisenberg  group"  is  not  in  fact  one  object,  but  a 
collection  of  similar  objects,  rather  like  a  functor,  ora  scheme  in  algebraic 
geometry,  or  even  a  combination  of  several  overlapping  functors.  Thus 
one  has  to  look  with  a  certain  pair  of  spectacles  in  order  to  sec  the  topics 
as  being  united  via  a  single  common  phenomenon. 

Roger  Howe  (1980) 

Perhaps  the  most  rewarding  aspect  of  analog  computation  is  the  extent 
to  which  eletnentary  computational  primitives  are  a  direct  conseijuence 
of  fundamental  laxvs  of  physics. 

Carver  A.  Mead  (1989) 

I  would  like  to  express  my  belief  that  the  holographic  concept  of  Gabor  is 
as  fundamental  as  the  general  relativity  theorem  of  Einstein,  and  it  has 
to  be  explored  further  for  a  better  understanding  of  nature  in  which  we 
live. 

Pal  Greguss  (1970) 

Man  solltealles  so  einfach  wie  mdglich  machen,  aber  nicht  emfacher. 

Albert  Einstein  (1879-1955) 


5.  The  Kirillov  quantization 

Let  G  denote  the  multiplicative  group  of  all  unipotent  real  matrices 

/)  X  2\ 

0  1  y  :=(x,y,z) 

Vo  0  l/ 

with  unit  element  (0,0,0).  Then  G  is  a  simply  connected  two-step  nilpotent 
Lie  group,  diffeomorphic  to  the  differential  manifold  (01  01)  ■  01,  with 

one-dimensional  center  Cc.  =  ((0,0,z)|z  e  01).  The  polarized  presentation 

(xi,yi,zi )  •  (x,j,y2,z^)  =  (xi  +  x^.yi  +  y2.zi  +  Z2  -t  xiy2) 
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and  the  equivalent  isotropic  presentation 

(xi  ,y  i.zi )  •  {X2,y2,z2)  =  (xt  +  X2,yi  +  yz.zi  +  Z2  +  “  ’«-2yi )) 

show  that  G  is  a  realization  of  the  three-dimensional  Heisenberg  group  [23, 
56, 186, 155].  The  Haar  measure  of  G  is  Lebesgue  measure  dx  ; ;  dy  .  dz  of 
the  underlying  differential  manifold  91-^.  The  Lie  algebra  g  =  T(o  o,o)(G)  of 
G  is  formed  by  the  upper  triangular  matrices  {(x,y.z)  -  (0,0,0)|x,y ,  z  G  tH]. 
In  terms  of  the  canonical  basis  {P,  Q,  Z)  of  g  which  is  given  by  the  matrices 

/O  1  0\  /O  0  0\  /O  0  1\ 

P  :=  (  0  0  0  .  Q  ;=  0  0  1  j  ,  Z  ;=  0  0  0  , 

\0  00/  \0  0  0/  Vo  0  0/ 

the  Heisenberg  commutation  relations  read  as  follows: 

fP,Ql  =  PQ_QP  =  Z,  fP,Z]=0,  [Q.Z]=0. 

Thus  the  center  c  =  91. Z  of  the  Heisenberg  Lie  algebra  g  is  one-dimensional 
and  satisfies  exp(c)  =  Cc-  Obviously 

/O  Q  c\  /  0  0  0\ 

adJo  0  b  =  0  0  0 

Vo  0  o/  V-t'  °  O/ 

for  a,  b,  c  €  91.  The  adjoint  action  (x.y,  z)  >->  Adclx.y.  z)  of  G  on  g  linearizes 
the  action  of  G  on  itself  by  inner  automorphisms  and  is  therefore  defined  by 
conjugation: 

/I  X 

0  1  y  O  0  b  O  1  y  =  0 

Vo  0  1  /  Vo  0  0/  \0  0  1 /  Vo 

With  respect  to  the  basis  [P,  Q,  Z)  of  g  it  follows 

/  1  0  0\ 

Adc(x,y,z)=  0  10  ((x.y.zjeG). 

V-y  X  1/ 

Consequently  the  identity 
Adc  o  exp  =  exp  oedj, 

holds  as  usual  on  g.  If  {P*,Q*,Z']  denotes  the  dual  basis  of  [P, Q.Z),  the 
coadjoint  action  (x,y,z)  CoAdclx.y.z)  of  G  in  the  dual  vector  space 
g*  =  p  p,(G)  of  g  is  given  by  the  formula 

('  ®  M 

CoAdG(x,y,z)=  0  1  -x  ((x,y,z)GG). 

Vo  0  I  / 


Q  c  -t-  bx  -  Qy 
0  b 

0  0 
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Hence 

CoAdc(x,y,2)(£,P'  +  nQ’  +  vZ*)  = 

(£,  4-A'y)P*  4-  (t|  -'v'xlQ'  4-x'Z’, 

where  the  triple  (i.n.T)  denotes  real  coordinates.  From  this  identity  the 
Kirillov  coadjoint  orbit  picture  of  G  becomes  apparent:  For  each  center 
frequency  v  ^  0  the  orbit  of  the  point  (0,0, v)  under  the  CoAdG-<7ction 
of  G  is  the  affine  plane  (Dv  in  g*  through  the  point  vZ*  parallel  to  the  plane 
spanned  by  'P*,  Q*’,  through  the  origin  of  g”.  Forv  =  0  the  points  lf,,ri,0)  are 
zero-dimensional  coadjoint  orbits  0|  t.,,  i  of  G  located  in  the  plane  spanned  by 
[P*,  Q*]  through  the  origin  of  g*.  Notice  that  the  symplectic  plane  Ov  (v  4  0) 
carries  the  canonical  differential  2-form 


a’o)  =  v  ■  d£,  A  dri 

which  endows  Ov  with  a  well-defined  orientation.  The  point-orbit 
:  fit)  can  be  identified  with  the  Dirac  measure  c  t  n 
located  at  the  point  (£.,n!  of  the  "singular"  plane  v  =  0. 

In  terms  of  the  Heisenberg  nilpotent  Lie  group  G,  the  Kirillov  quanti¬ 
zation  procedure  means  the  choice  of  an  irreducible  unitary  linear  represen¬ 
tation  U  of  G  acting  in  a  complex  Hilbert  space  :h  and  the  coadjoint  orbit 
Ou  associated  with  the  isomorphy  class  of  U.  Recall  that  U  is  a  continuous 
homomorphism  of  G  into  the  unitary  group  UCH  )  of  Tf,  i.e.,  U  :  G  »  UCh  ) 
is  a  mapping  such  that 

U((xi,yi,2i )  •  (X2,y2,z2)l  =  U(xi,yi,2i )  o  U(x2.y2. ). 

U(0,0,0)  --=  id-jf. 


and  such  that  the  mapping  G  ^  ;H'  F  ((x,y,z),tj))  U(x,y,z)\|^  :h'  is 

continuous.  Irreducibility  means  that  1U,‘.H')  is  not  obtained  as  a  direct  sum 
of  two  nontrivial  linear  subrepresentations  of  G.  Equivalently,  there  exists 
no  proper  dosed  vector  subspace  ^  |0!  of  "H  invariant  under  the  action  of  G 
by  U  in  7(. 

According  to  the  Stone-von  Neumann-Mackey  theorem  [155]  the  Kir¬ 
illov  quantization  problem  has  a  solution  unique  up  to  unitary  isomorphy: 
For  any  given  center  frequency  "v  ^  0,  the  central  character 

Xv:Cc  9  (0,0,z) -1 

determines  up  to  a  unitary  isomorphism  a  unique  infinite-dimensional  irre¬ 
ducible  unitary  linear  representation  Uv  of  G  in  the  standard  Hilbert  space 
“K  =  Lei'll)  which  acts  on  the  vector  subspace  S(SH)  according  to  the  rule 

27Tiv(Z  »  ) 


LJv(x,y,z)\i)(t)  =  e 


'ili(t-x)  (tefH). 


{  Schatipp 
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Thus  for  all  elements  (x.y.z)  e  G,  the  transition  amplitude 

Uv(x,y,z)vl)  €  S(iH) 

is  obtained  by  time  shifting  and  phasor  multiplication  with  respect  to  the 
frequency  v  ^  0  of  the  wavelet  tj>  €  S(tH). 

Linear  representations  of  G  that  are  unitarily  isomorphic  to  one  of  the 
representations  (Uv, L^(iH))(v  7^  0)  of  G  are  called  linear  representations 
of  Schrddinger  type.  The  Kirillov  correspondence  assigns  to  the  coadjoint 
orbit  (Ov,mc)J  €  gVCoAdclG)  (v  ^  0)  of  the  point  (0,0, v)  in  p*  the  iso- 
morphy  class  of  (U.v,L^(9t)).  Notice  that  this  isomorphy  class  contains  the 
Bargmann-Fock  model  of  quantics  describing  bosons  by  annihilation  and 
creation  operators  (cf.  Section  1  supra),  and  also  the  linear  lattice  represen¬ 
tation  of  G  (see  Section  11  infra).  Each  element  of  the  isomorphy  class,  i.e., 
each  linear  representation  of  Schrddinger  type  of  G  realizes  (Uy,  L^(tH))  by 
quantum  complex  linear  superposition. 

Notice  that  the  complex  vector  space  of  C°°-vectors  for  the  linear  rep¬ 
resentation  Uy  (-v  7^  0)  of  G  acting  on  Jf  =  L^(91)  is  given  by  the  Schwartz 
space  8(91)  on  %  and  that  the  differentiated  form  of  Uy  reads 

Uy(P)  =  ^Uy(exp(sP))s  0  =  . 

Uy(Q)  =  ^Uy(exp(sQ))s  0  =  2mvt, 

Uy(Z)  =  ^Uy(exp(sZ))s  c  =■  Iniy. 

The  linear  operators  [-0/0t,2m'vt!  determine  a  representation  of  the 
Schrddinger  operators  by  skew-symmetric  operators.  In  particular,  these 
operators  satisfy  the  Heisenberg  commutation  relation  [23, 155] 

[P,  Ql  =  PQ  -  QP  =  Zrriv.id  €  5H,v  7^  0). 

Shortly  after  Werner  Heisenberg  introduced  the  commutation  relations  in 
quantics,  Hermann  Weyl  discovered  in  1928  that  they  could  be  interpreted 
as  the  structure  relations  for  the  real  Heisenberg  Lie  algebra  g.  The  com¬ 
mutation  relations  combined  with  the  Parseval-Plancherel  theorem  and  the 
Cauchy-Schwarz  inequality  provide  the  Heisenberg  inequality  [15] 

^  t^|ii)(t)|^dt.  j  s^lJxHslI^ds  ^  Q  |vj)(t)|^dt^ 

which  expresses  the  local /global  duality  between  the  wavelet  li)  €  8(931  and 
its  Fourier  transform  €  S(1H].  A  standard  density  argument  shows  that 
the  Heisenberg  inequality  extends  to  all  elements  of  the  complex  Hilbert 
space  =  L^(91).  It  implies  the  classical  Heisenberg  Uncertainty  Principle 

AUv(P)  AUv(Q)  (v6fH,V5^0) 
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in  terms  of  the  standard  root-mean-square  deviations  of  the  operators  Uv(P) 
and  Uv(Q)  acting  in  IK  =  L^(SR).  Expressed  in  terms  of  the  canonical  basis 
{P,  Q,  Z}  of  the  Heisenberg  Lie  algebra  g,  the  classical  Uncertainty  Principle 
takes  the  form  of  the  Robertson  relation  [149] 

AUv(P)-AUv(Q)  ^  l/2|Uv(Z)|  /O). 

The  Uncertainty  Principle  has  been  one  of  the  key  relationships  in 
quantum  theory  for  over  sixty  years.  In  his  Chicago  lectures  of  spring  1929, 
Werner  Heisenberg  regarded  this  inequality  as  the  precise  mathematical  ex¬ 
pression  of  the  Uncertainty  Principle  within  the  formalism  of  quantics  [69], 
Moreover,  it  has  been  recognized  as  one  of  the  fundamental  results  in  signal 
processing  [13,  40, 140, 198],  Nevertheless,  in  the  context  of  quantum  holo¬ 
graphy  it  is  very  important  to  appreciate  that  the  Heisenberg  Uncertainty 
Principle  does  have  a  number  of  serious  weaknesses.  These  are  particularly 
related  to  the  fact  that  the  standard  deviations  ALlv(P)  and  AUv(Q)  which 
are  defined  by  the  square  root  of  the  expectation  values  only  give  very  gen¬ 
eral  information  about  the  spreads  of  the  probability  density  functions  and 
are  insensitive  to  the  fine  structure  of  quantum  interference  patterns  [139, 
180,  181).  The  structure  of  the  Heisenberg  group  G,  however,  includes  the 
Poisson  summation  formula  and  is  therefore  rich  enough  to  getting  around 
this  inadequacy  of  the  Heisenberg  Uncertainty  Principle.  Indeed,  an  appli¬ 
cation  of  the  linear  lattice  representation  6i  of  G  to  the  interfering  wavelet 
packet  amplitude  densities  enables  the  rigorous  establishment  of  the  quan¬ 
tum  parallelism  according  to  which  different  alternatives  at  the  photon  level 
are  allowed  to  coexist  in  quantum  complex  linear  superposition  (see  Theo¬ 
rem  11.3  infra). 

Let  Uv  denote  the  contragradient  representation  of  Uv  so  that 

Uv(x,y,z)  =  ‘Uv((x,y,zr' ) 

holds  for  all  elements  (x,y,z)  €  G.  Obviously 

U.ICc,  =Xv,  UvICg=X-v  (v€91,v/0). 

In  terms  of  neural  network  theory,  Uv  is  the  feedback  or  backprojection 
representation  of  G  associated  to  Uv  ("v  /  0).  The  flatness  of  the  coadjoint 
orbits  (Ov.cuoJ  €  g*/CoAdc(G)  and  (O-v.mo  J  e  pVCoAdclG)  (v 
0)  in  the  dual  vector  space  g*  of  the  Heisenberg  Lie  algebra  g  associated 
by  the  Kirillov  correspondence  with  the  isomorphy  classes  of  Uv  and  Uv, 
respectively,  is  equivalent  to  the  square  integrability  modulo  Cc.  of  Uv  and 
Uv.  If  G/Cg  is  endowed  with  the  differential  2-form 


cui  =  dx  A  dy, 


{  Schcmpp 
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induced  by  the  form  cuo,  on  Oi,  the  central  projection  and  backprojection 
G-slice  theorem  follows. 

Theorem  5.1.  The  holographic  transform  is  the  coefficient  form  of  the  linear 
Schrbdinger  representation  U|  of  the  polarized  Hciscnbi'r^  S>''-’up  G  pro¬ 
jected  along  the  center  Cq  onto  |G  Cr,,iei).  Thus  the  Kinlioz’  quantiza¬ 
tion  identities 

‘  J  H|  14'’,  H’';  x,y )  •  dx  A  di(  -  :Ui(x,y,0lvl>'!<(>'.  •  u’l 
Hi  (4’, X, y  I  •  dx  A  di)  -  -  :Ui(x,y,0)4’  4^  ■  ici 


hold. 

The  irreducibilitv  combined  with  the  unitarilv  of  the  linear  Schrodinger 
representation  U  i  of  G  implies  that  the  commutant  ot  tl  i  is  isomorphic  to  the 
compact  torus  group  T  Hence  from  the  central  projectiori  G-slice  theorem 
it  follows: 

Corollary  5.2.  The  holographic  trace  transform 
iM  t  kit  ‘Hi  (y;  x,  ii  I  •  dx  A  dy 

extends  to  a  mapping  ot  l.“’l'.Hi  into  I  '.Ml  such  that  the  identilv 

Hi(4’'.x,y|  dxAdy  Hilsnx.yi  dx /\  do 

implies  4'l  t  kit  cs)( t  kit  where  c  •  T  denotes  a  constant  phase  factor 

The  free  choice  of  the  phase  factor  c  •  T  reflects  the  tact  that  only  the 
phase  difference  is  ot  physical  importance.  Therelore  the  holographic  image 
encoding  procedure  needs  the  mixing  of  a  coherent  referi'iice  signal  beam 
by  a  linear  VIach-/.ehnder  interferometer  to  incrementalK  record  the  phase 
of  the  object  signal  beam  in  the  fmlogram  plane  tM  tM. 

lire  raoriliii^^  of  l)ir  iiilrit>ilir'^  I'f  nihrlfmhi'  piitteni^  s,;;,/ 
fe  he  foniietl  with  it  >iihieet  wiiir  "  iiiid  ii  "referenee  Wiire  '  )et  in  the 
luiitheiiuituiil  reprei^eiitiituni  of  the  interfereiiee  patteni  inten^itu  there 
i.s  nothing  except  arhitnin/  iiotaliini  to  dt>tiii^iii<h  one  iviive  from  the 
other  We  find  the  hologram  tnin<mittiiiue  of  ii  phiiie  holo^riiin  to  he 
symmetric  in  the  complex  iimplitudeti  of  the  luv  forming  wiive< 

Robert  f.  Collier,  Christoph  B.  Burkhardt, 
and  Lawrence  H.  Lin  (1971) 

The  cross-correhition  viewpoint,  howei’er,  better  than  any  other,  ren¬ 
ders  understandable  the  well  kiurwn  all-ranf(e-focusin<^  capability  of  the 
synthetic  aperture  radar  system,  implied  in  our  holo)^raphic  viewpoint. 
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Since  the  form  of  the  recorded  signal,  as  manifested  in  the  quadratic  phase 
factor,  is  a  function  of  range,  it  is  apparent  that  each  range  element  must 
be  processed  differently,  for  example,  by  correlation  with  a  reference 
function  tvhich  is  different  for  each  range.  Since  the  pulsing  provides 
resolution  in  range,  we  can  store  the  data  from  each  range  separately  and 
process  them  differently,  so  that  each  range  is  cross-correlated  with  the 
reference  function  proper  for  that  range.  Thus,  the  synthetic  antenna 
is  in  effect  focused  simultaneously  at  all  ranges,  a  most  remarkable  feat 
when  virwed  in  terms  of  the  capabilities  of  conventional  antennas. 

Emmett  N.  Leith  (1978) 


6.  Metaplectic  covariance 

Another  important  advantage  of  the  Kirillov  quantization  approach  lies  in 
the  fact  that  the  hidden  symmetries  of  the  holographic  transform 

ijdt'ldt'  4)(l)dt  ■  ‘  Hi  •  dx  A  dy 

can  be  expressed  by  the  group  of  automorphisms  of  the  Heisenberg  nilpo- 
tent  Lie  group  G  keeping  the  center  Cc  pointwise  fixed.  This  group,  the 
metaplectic  group  Mp(  1 ,  fH  |,  forms  a  twofold  cover  of  the  symplectic  group 
Sp(l,'Tt)  -  SL(2,fR).  Its  natural  action  on  the  hologram  plane  fH  :  pre¬ 
serves  the  center  frequencies  v  ?<=  0  (155, 165,  187].  Its  action  on  the  complex 
Hilbert  space  I -^('711  is  performed  by  the  metaplectic  representation  a.  The 
representation  a  of  Mp(l,'JH)  is  a  projective  unitary  linear  representation  of 
Sp(1  ,  IH)  in  and  satisfies  the  metaplectic  covariance  condition  of  the 

Kirillov  quantization 

ulal  '  LIvfx.y.Ol  .  alyl  -=  Uv((n  '(x.y  1,0)1 
((x,y)  :  fH),  for  all  o  ^  Sp(l,iH|.  It  follows 

Theorem  6.1.  The  holographic  transform  satisfies  the  metaplectic  covariance 
identity 

r  ■  ■  "  '  '  ■  —  ■  -  -  -  , 

!  Ml  lalqlih,  ulgl^v, X,  y  I  dx  A  dy 
]  M|(4',ip;g  '(x,y))  •  dx  Ady 

for  all  complex-valued  wavelet  packet  amplitude  densities  il;(t')dt'  and 
(p(t)dt  belonging  to  S(fH)  and  all  elements  g  6  Spll.fR). 

Notice  that  the  action  of  Sp(l,91)  in  by  the  metaplectic  repre¬ 

sentation  a  includes  the  dilations  by  real  scaling  factors  a  /  0,  and  the 
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one-dimensional  Fourier  transform  J,  both  being  of  importance  for  the  Ga¬ 
bor  wavelet  transform.  Indeed,  for  the  diagonal  matrix 


in  Sp(  1  ,iH),  the  scaling  (or  zoom.ing)  identity 
a(g„)4>(t)dt  =  [or'  '-^iMa^'tldt 
follows,  and  similarly  for  the  Weyl  matrix 


of  Spl  I  .(H)  the  relation 

lU’ltldt  -  (fU'lt )  dt 
holds  for  all  4’  c  SltH). 

A  Fourier  transform  hologram  is  an  optical  hologram  which  records 
the  stationary  quantum  interference  pattern  of  two  coherent  wavelets  whose 
complex-valued  wavelet  packet  amplitude  densities  at  the  symplectic  holo¬ 
gram  plane  (H  'H  are  the  Fourier  transforms  of  both  the  object  and  the 
reference  wavelet.  It  follows  as  a  special  case  of  the  metaplectic  covariance 
identity  of  the  holographic  transform; 

Corollary  6.2.  1-or  the  complex-valued  wavelet  packet  amplitude  densities 
ijMt'ldt'  and  4)(t)dt  in  cS('H)  the  90'  rotation  identity 

ft  1  (.'fvl',  (ftt)'.  X, y  )  ■  dx  A  dy  “  H|  (ih,  ly;  -  y , x)  ■  dx  A  dy 
holds. 

If  4’  (t)dt  -  4’( -  t  )dt  denotes  the  complex-\alued  wa\elet  packet 

amplitude  density  of  the  time-reversed  optical  signal,  the  Fourier  in\  ersion 
theorem  yields  the  identity 

a(y^( I4'(t|dt  -  4r  (t)dt. 

Thus  the  hologram  plane  rotated  through  180”  corresponds  to  the  time- 
reversed  writing  signals. 

It  should  be  observed  that  the  Weyl  matrix  go.  the  diagonal  matrices 
g,. (a  /  0),  and  the  unipotent  matrices 
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for  u  €  are  generators  of  the  group  Sp(l,iH).  In  fact  they  give  rise  to  an 
Iwasawa  decomposition 

Spll.iR)  =KAN 

into  compact,  diagonal,  and  unipotent  subgroups.  As  a  paraxial  ray-transfer 
matrix,  €  Sp(1  .iH)  defines  a  thin  cylindrical  lens  of  focal  length  f  =  -1/u 
and  CT(g"M  defines  the  chirp  modulation  operator 

o'(g“)3l)(t)dt  = 

of  chirp  rate  u  ^  0.  For  u  <  0  the  chirp  modulation  operator  CT(g“)  g 
U(L^(iR))  defines  an  up  chirp  amplitude  density  modulation  and  for  il  >  0 
a  down  chirp  amplitude  density  modulation. 

Corollary  6.3.  The  chirp  amplitude  density  modulation  a(g“ )  of  chirp  rate 
u  /  0  can  be  corrected  by  a  thin  cylindrical  lens  of  focal  length  f  =  1/u. 
Finally,  for  the  drift-length  transfer  matrix 


the  identity 

J(or(g)tl))(t)dt  =  e-’'’’  '■’'*'8"  “a(g-“):’'i|^(t)dt 

follows  where  sign  u  =  u/|u|.  The  phasor  occurring  in  this  formula  arises  by 
the  Maslov  index  which  is  responsible  for  the  fact  that  Mp(  l.iR)  =Sp(1,iR) 
forms  a  twofold  cover  of  Sp(  1 ,  5H).  Since  optical  holography  is  phase  sensi¬ 
tive,  it  is  exactly  this  sudden  change  in  phase  (Gouy  effect)  of 

7t/2  =  Tr/4  -  (— 7i/4) 

which  makes  it  not  appropriate  to  place  the  hologram  plane  in  the  focal 
planes  of  the  beam  expanding  lenses  of  the  basic  interferometric  set-up. 

It  should  be  observed  that  the  construction  of  the  metaplectic  represen¬ 
tation  o  of  Mp(  1 ,9^1  is  completely  analogous  to  the  construction  of  the  spin 
representations  of  the  orthogonal  groups  (symmetric  tensors  taking  place  of 
anti-symmetric  tensors). 

Example  6.4.  Let  T  >  0  be  given  and  denote  by 


the  rectangular  pulse  of  duration  T.  In  terms  of  the  triangular  pulse 


A(t)  = 


1  -It|  |t|  $  1 
0  |t|  >  1 
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and  the  cardinal  sine  mother  wavelet 

{sin  TTX  Y  0 

1  x  =  0 

the  holographic  trace  transform  of  4l)T(t)dt  takes  the  form 

Hi(4jjT:x,y)  •  dxAdy  =  A  (1^)  s>ncy(T  -  |x|)  ■  dxAdy. 

An  application  of  Theorem  6.1  supra  shows  that  the  chirp  pulse  density 
O'lg^l'i^rltldtof  duration  T  and  chirp  rate  u  ^  0  admits  the  holographic  trace 
transform 

Hi(a(g^)vl)T:x,y)  dxAdy  =  A  ^Y)s**'c(y  -ux)(T  -  |x|)  •  dxAdy. 

Satellite  altimetry  uses  the  ranging  capability  of  radar  sensors  to  mea¬ 
sure  the  surface  topographic  profile.  An  example  of  an  advanced-type  sys¬ 
tem  is  the  SFASAT  altimeter  which  was  put  into  orbit  in  June,  1978.  The 
satellite  orbital  altitude  was  790  km  and  the  platform  velocity  (ground  track) 
V  6.6  km/sec.  SEASAT  was  in  operation  for  a  total  of  105  days.  During  that 
time,  the  altimeter  provided  profiles  of  the  ocean  surface  with  an  accuracy 
of  a  fraction  of  a  meter.  In  the  altimetry  mode,  SEASAT  operated  at  a  center 
frequency  of  13.5GHz.  The  stable  local  oscillator  (STALO)  generated  a  se¬ 
quence  of  12.5  nanosec  pulses  at  a  250  MHz  center  frequency  which  has  been 
applied  to  the  chirp  generator.  The  SEASAT  chirp  generator  was  a  surface 
acoustic  wave  (SAW)  device  fabricated  on  a  lithium  tantalate  substrate.  The 
resulting  chirp  modulated  pulse  had  a  linearly  decreasing  frequency  with  an 
80  MHz  bandwidth  and  a  pulse  duration  T  =  3.2  ysec.  The  pulse  repetition 
frequency  (PRF)  was  1020  Hz.  During  the  transmit  mode,  the  chirp  pulse  at 
250  MHz  has  been  upconverted  to  3375MHz,  amplified  to  a  1  W  level,  and 
multiplied  by  4  to  13.5GHz.  This  also  multiplied  the  bandwidth  by  4  in 
order  to  achieve  the  desired  320  MHz  bandwidth  and  height  measurement 
accuracy  of  0.47  m.  In  the  receive  mode  the  chirp  pulse  have  been  upcon¬ 
verted  to  3250MHz,  amplified  to  0.1  W,  multiplied  by  4  to  13.0GHz,  and 
used  for  mixing  with  the  received  echo  signal. 

Example  6.5.  In  SAR  remote  sensing  systems  (see  Remark  4.4  supra),  a 
target  at  distance  ro  with  velocity  v  relative  to  the  moving  platform  induces 
a  relativistic  chirp  amplitude  density  modulation  ff(g'‘)  of  the  received  echo 
signal  of  chirp  rate 

47tV'^ 

^  =  “i — 

Ato 

where  A  =  c/|v|  denotes  the  center  wavelength  of  the  coherent  radar.  The 
dependence  of  the  chirp  rate  u  of  the  range  ro  is  called  the  range-azimuth 
coupling. 
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In  the  SAR  optical  signal  processor,  let  A'  denote  the  wavelength  of  the 
coherent  light  scanning  the  holographic  film  of  transport  velocity  v'.  If 

Vo  =  v/v'  ,  Ao  =  A/A' 

are  the  relative  SAR  platform  velocity  and  the  relative  radar  wavelength, 
respectively,  then  the  radar  return  focuses  at  distance 

f  _ 

47rVo 

from  the  hologram  plane.  It  follows  that  the  relativistic  effect  of  the  platform 
motion  generates  an  axial  astigmatism.  To  compensate  the  linear  range 
variation  of  the  focal  length  f,  a  wide-screen  equalizer  is  introduced  in  the 
hologram  plane.  Such  an  equalizer  takes  the  form  of  a  conical  lens  or  a  tilted 
cylindrical  lens  [90, 93, 94]  which  are  components  of  a  correcting  anamorphic 
optical  system  (cf.  Corollary  6.3  supra).  The  recent  developments  of  SLMs 
have  supplied  an  attractive  replacement  for  the  holographic  film  as  an  input 
medium.  Moreover,  two-dimensional  optical  data  processors  using  laser 
diode  illumination,  acousto-optic  (AO)  cell  data  input  and  a  CCD  detector 
array  for  the  output  have  been  designed.  For  each  realization  of  the  SAR 
data  processor,  however,  it  is  important  to  notice  that  the  spatial  resolution 
of  SAR  imaging  systems  is  independent  of  the  range  re  to  the  target  and  the 
velocity  v  of  the  radar  platform. 

In  the  imaging  mode,  SEASAT  SAR  operated  at  a  center  frequency  of 
1275  MHz  (L-band,  A  =  23.5cm)  with  pulse  duration  T  =  34  pisec  and  PRF 
selections  of  1464,  1537,  1580,  and  1647Hz  admitting  a  spatial  resolution  of 
25  m.  The  depression  angle  ranged  between  67”  to  73°  and  produced  an 
image-swath  width  of  100  km.  The  antenna  was  a  10.74  m  by  2.16  m  phased 
array  system  deployed  after  orbit  insertion.  The  microwave  holographic 
data  for  each  100  km  wide  image-swath  have  been  optically  processed  to 
produce  four  film  strips  each  of  which  covered  a  width  of  25  km  and  a 
length  of  several  thousand  kilometers. 

The  first  Shuttle  imaging  radar  (SIR-A)  experiment  was  launched  on  the 
second  flight  of  Columbia  in  November,  1981.  The  satellite  orbital  altitude 
was  250  km  and  the  image-swath  width  50  km  in  order  to  cover  a  surface  area 
of  about  10  million  km^.  The  SAR  antenna  of  9.44  m  by  2.09m  radiating  area 
was  stowed  inside  the  Shuttle  cargo  bay  and  operated  when  the  Shuttle  was 
in  an  inverted  attitude.  As  in  the  SEASAT  SAR,  the  transmitted  pulse  was 
a  chirp  pulse  of  1275  MHz  center  frequency  admitting  a  spatial  resolution 
of  38m.  The  image  data  were  recorded  as  holographic  film  on  board  the 
Shuttle.  The  data  film  was  developed  and  then  processed  at  the  laboratory  by 
coherent  laser  light  to  generate  the  original  image  film  at  a  scale  of  1  :  500,000. 

A  second  Shuttle  imaging  radar  (SIR-B)  experiment  was  conducted  in 
October,  1984.  For  SIR-B  the  SAR  antenna  was  modified,  however,  to  permit 
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the  depression  angle  to  be  changed  during  the  mission  within  a  range  of  30° 
to  75°.  The  center  wavelength  X  was  the  same  as  in  the  earlier  missions. 
The  satellite  orbital  altitude  was  225  km  and  the  spatial  resolution  improved 
to  25  m. 


Figure  6.1 

As  an  example.  Figure  b.!  shows  a  SAR  image  of  the  Lakshmi  region 
of  the  planet  Venus.  It  has  been  generated  by  the  Soviet  Union  VENERA 
15  and  16  orbiters  through  the  cloud-shrouding  atmosphere  of  Venus  which 
is  impenetrable  for  visible  light.  NASA  flew  a  SAR  around  Venus  in  1990 
for  the  Magellan  mission.  The  radar  is  operating  at  a  center  frequency  of 
2385  MHz  and  provides  a  resolution  down  to  250  m;  see  Figures  6.2  and  6  3. 
By  the  late  1990s  the  Cassini  spacecraft  may  be  put  in  an  orbit  around  Saturn 
and  image  its  moon.  Titan,  at  L-band  and  K-band  on  flybys. 

Remark  6.6.  The  Heisenberg  group  G  carries  a  sub-Riemannian  metric  and 
a  sub-Laplacian  [174].  On  the  fibre  T*^  ^  i|(G)  with  base  point  (x,y.z)  G  G 
of  the  cotangent  bundle  T*(G)  of  G,  the  associated  bundled  quadratic  form 
Q  is  given  by 

Q(x.M,i)(i-0."v)  =  (i  -Fvy)^  +  (n  -  vx)^. 
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Thus  Qix.y.z)  is  a  parabolic  quadratic  form  with  one-dimensional  null-space 
spanned  by  the  vector  (-y,x,l)  e  Oi,  (Oi.tuoi )  €  g'/CoAdclG).  Thesub- 
Laplacian  Dg  of  G  forms  a  sub-elliptic  linear  differential  operator  given  by 


The  Heisenberg  helix  is  the  analog  of  a  geodesic  for  the  sub-Riemannian 
geometry  of  G  defined  by  the  sub-elliptic  bundled  quadratic  form  Q  on 
T*(G).  This  fact  corresponds  to  the  "expansion  theorem"  discovered  by 
Dennis  Gabor  in  1965  which  says  that  information  attached  to  an  optical 
signal  pattern  is  not  carried  by  "rays",  but  by  a  certain  "tube  of  rays"  the 
cross-section  of  which  is  proportional  to  the  square  of  the  center  wavelength 
\  of  the  optical  signal  [57]. 


/  have  no  doubt  that  my  latest  publication  is  my  luckiest  find  yet.  I 
have  also  muck  perfected  the  experimental  method,  and  I  can  naio  pro¬ 
duce  really  pretty  reconstructions  from  apparently  hopelessly  muddled 
diffraction  patterns. 

Dennis  Gabor  (1900-1979)  to  Max  Bom 
(1882-1970)  on  15th  June,  1948 


The  coherence  of  laser  light  finds  a  spectacular  application  in  the  making 
of  holograms.  A  typical  hologram  looks  like  a  gray  piece  of  plastic 
with  no  evident  image  on  it.  However,  the  hologram  actually  has  a 
microscopically  fine  and  highly  complex  pattern  of  lines  and  spaces.  Nmo 
illuminate  the  developed  hologram  by  the  same  laser  system,  except  that 
the  object  has  been  removed.  The  pattern  on  the  hologram  converts  the 
pure  laser  beam  into  a  precise  replica  of  the  pattern  of  ordinary  light  that 
would  be  obtained  if  the  object  were  still  there.  In  this  way,  the  hologram 
acts  as  a  window.  Each  eye  looking  at  the  illuminated  hologram  sees 
a  different  point  of  view,  thus  creating  a  three-dimensional  image  by 
an  illusion  of  depth  and  solidity.  By  changing  one's  vantage  point,  it 
is  possible  to  see  behind  things  and  around  corners,  just  as  if  one  was 
looking  at  the  real  object  through  an  ordinary  window.  The  image  has  a 
realistic  three-dimensional  appearance.  Holography  resulted  in  a  ivhole 
new  concept  in  the  development  of  imaging  systems  and  technology. 

Enders  A.  Robinson  (1989) 


7.  The  holographic  image  decoding  procedure 

In  the  following,  the  symplectic  homogeneous  G-manifolds 
(Oi  ,0)0, )  6  gVCoAdclG), 

(O-i.cu-,)  €  gVCoAdclG). 
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and  the  central  projection  G-slice  G/Cg  will  be  identified  with  the  two-sided 
symplectic  hologram  plane  ®  IR.  Then  the  Heisenberg  nilpotent  Lie  group 
G  is  a  half-line  bundle  over  the  symplectic  hologram  plane  <3.  IH,  the  two 
sides  of  which  carry  the  canonical  differential  2-forms  w  1  =  dx  A  dy  and 
tu-i  =  — dx  A  dy  =  dy  A  dx,  respectively.  In  view  of  the  square  integrability 
modulo  Cg  of  the  irreducible  unitary  linear  representations  Ui  and  Ui  of  G, 
an  application  of  Schur's  lemma  provides  the  biorthogonality  relations  [119, 
136, 155, 168] 

H,(tl)',(p';x,y)Hi(tl),(p;x,y)dxdy  =  (tj;'  ©  cplti)  ©  <p') 

for  the  complex  valued  wavelet  packet  amplitude  densities  i[)'(t)dt,  (p'(t)dt, 
tl)(t)dt,  <p(t)dt  in  8(91).  Therefore  the  dyads 

f  E(\|)',.;x,y)  :  <p'  •-4_Hi(il)',(p';x,y)Ui(x,y,0)vp' 

1  E(\J),.;x,y) :  cp  >-4  Hi(t|),(p;x,y)U|(x,y,0)4> 

((x,y)  €  91  i  91),  which  embed  ijj'  €  8(91)  and  t))  €  8(91),  respectively, 
into  the  Hilbert-Schmidt  (HS)  operator-valued  differential  2-forms  acting 
on  L^(91),  define  a  Ui-system  (E(., .;x,y))|,,y,g.ji.i ;h,  and  a  Ui-system 
(E(.,  .;x,y))(x,y)€iH.f.iH  of  coherent  states  based  on  the  symplectic  hologram 
plane  91  •©  91  [120, 134].  Observe  that  these  coherent  state  systems  provide  a 
quantum  theoretical  description  of  nonspreading  wavelet  packets  [57,  134, 
162]  and  therefore  of  the  Gabor  tubes  of  rays  (see  Remark  6.6  supra). 

Theorem  7.1.  For  all  complex-valued  writing  wavelet  packet  amplitude 
densities 

t[;'(t)dt,  (p'(t)dt,  ii)(t)dt,  <p(t)dt 

in  8(91)  the  gain  equations 

[  E(;i)'.<p';x,y)dxdy  =||il.'||2<p'. 

E(\l),<p;x,y)dxdy  =  llxplUtp 

hold. 

Remark  7.2.  Similar  inversion  formulas  can  be  established  for  the  affine 
coherent  states  defined  by  the  wavelet  transform  and  the  irreducible  unitary 
linear  representations  of  the  non-unimodular  affine  Lie  group 


G,  =!(a,U)|a>0,  ^€91) 


Quantum  holography  and  neurocomputer  architectures  } 


{  421 

of  the  real  line  91  ("at+  Pgroup").  The  affine  wavelets  are  particularly  useful 
code  primitives  for  voice  decomposition  [65]. 

The  non-abelian  solvable  Lie  group  G  +  has  the  presentation 

(a),|3j)  ■  (a2,32)  =  (aia2,ai|32  +  3i). 

Of  course,  G  ^  may  also  be  represented  as  the  group  of  real  matrices 

(a>0,  3e9l) 

under  matrix  multiplication.  The  left  Haar  measure  of  G ,  is  given  by 
da  0  d0/a^  and  the  right  Haar  measure  by  da  @  dfi/a.  Apart  from  the 
trivial  one-point  coadjoint  orbits  located  on  the  real  line  91,  the  affine  group 
G ,  of  91  admits  exactly  two  non-trivial  coadjoint  orbits,  the  open  upper 
half-plane  0 1  and  the  open  lower  half-plane  0_.  It  follows  from  the  Kirillov 
coadjoint  orbit  picture  of  G  ^  that  every  irreducible  unitary  linear  represen¬ 
tation  of  G+  of  dimension  >  1  is  unitarily  isomorphic  to  either  U  or  its 
contragradient  representation  U,  where  U  can  be  realized  on  the  complex 
Hilbert  space  (91 )  by  the  assignment 

U(a,  3)\])(t)  =  -f-loga)  (t  €  91), 

and  U  by  the  action 

U(a,3)\i)(t)  =  +loga)  (t  6  91) 

on  vj^  €  S(91).  More  convenient  are  the  realizations  on  L^(91 1  )  given  by 

U(a3)v(;(t)  =  e ‘‘^‘v/a  li^lat)  (t  >  0), 

and  on  L^(91_ )  given  by 

U(a,  3)\p(t)  v/a  \l)(at)  (t  <  0). 

Notice  that  the  irreducible  unitary  linear  representations  U  and  U  of  G » 
are  square  integrable  and  that  their  coefficient  functions  form  the  wideband 
ambiguity  functions  [48, 122]. 

Remark  7.3.  One  of  the  most  dramatic  deployments  of  computer  technology 
in  radiological  diagnostic  imaging  is  the  development  of  computer-aided 
tomography  (CT).  In  this  case  and,  more  recently,  in  magnetic  resonance 
imaging  (MRI)  systems,  the  computational  capability  made  possible  by  the 
advent  of  high  speed  computers  has  been  an  absolutely  essential  ingredient 
in  the  process  of  image  formation.  Similarly  to  holography,  the  raw  data 
provided  by  the  physical  imaging  system  in  CT  or  MRI  is  in  an  encoded 
form  which  bears  no  discernible  resemblance  to  the  two-dimensional  array 
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of  information  comprising  an  image  that  can  be  visually  perceived.  CT 
generates  an  image  of  a  cross-sectional  slice  of  the  body  lying  perpendicular 
to  the  long  axis  of  the  patient  being  examined.  Unlike  optical  holography,  in 
which  the  symplectic  hologram  plane  91  ®  IH  is  transversal  to  the  direction 
of  laser  irradiation,  CT  is  based  upon  the  measurement  of  the  attenuation 
of  X-ray  beams  lying  entirely  within  the  plane  of  the  section  being  imaged. 
Turning  from  optical  holography  to  CT  [121],  the  preceding  identities  give 
rise  by  an  application  of  the  spectral  theory  of  the  irreducible  reductive 
dual  pair  (Sp{l,9l),6(n,9l))  inside  Sp(n,91)  [78, 159],  to  the  singular  value 
decomposition  of  the  Radon  transform  Jl :  S(91")  —>  S(91  x  Sn-i )  acting  on 
functions  f  €  S(91'')  according  to 

3?f(r,cu)=  I  f(x)e(r-<a,|x»dx 

(cp  =  Dirac  measure  located  at  the  point  p  €  91).  It  follows  that  the  inversion 
problem  for  the  Radon  transform  X  which  underlies  CT,  MRI,  and  tomo¬ 
graphic  reconstruction  for  geophysical  applications  is  ill-posed.  Neurocom¬ 
puters,  however,  seem  to  be  more  appropriate  to  solve  ill-posed  problems 
than  conventional  digital  computers. 

As  a  special  case  we  obtain  from  Theorem  7.1  supra  the  following  result 
which  describes  the  readout  procedure  of  optical  holograms,  i.e.,  the  retrieval 
of  geometrically  encoded  information  by  adaptive  resonance.  1 1  is  important 
to  appreciate  that  the  energy  normalization  is  a  nonlinear  procedure. 

Corollary  7.4.  Let  (p  €  S(9t)  and  assume  that  €  S(91)  satisfies  the  nor¬ 
malization  condition  ||vp||2  =  1.  If  J  denotes  the  Fourier  transform  acting 
on  S(91)  then  the  filtered  backpropagation  formulae  of  degenerate  coherent 
four-wavelet  mixing 


hold. 

The  preceding  reproducing  diffraction  integrals  prove  the  fundamen¬ 
tal  law  of  optical  holography,  or  holographic  reciprocity  principle,  which 
governs  the  angle  image  decoding  procedure  of  optical  holograms;  The 
complex-valued  wavelet  packet  amplitude  density,  including  the  magnitude 
and  the  phase  of  the  conjugate  object  signal  recorded  in  the  hologram,  can  be 
read  out  simultaneously  by  illuminating  the  hologram  with  the  conjugate  to 
the  original  reference  signal  beam.  The  conjugate  beam  which  becomes  un¬ 
focused  by  a  beam  expander  provides  the  illuminating  wavelets  with  their 
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proper  phase  factors  for  adaptive  resonance.  Thus  the  geometric  encoding 
procedure  of  optical  holography  is  able  to  overcome  the  phase-blindness  of 
the  detectors  at  the  frequency  range  of  visible  light;  the  holographic  decoding 
procedure  reconstructs  the  complete  wavefront  creating  a  real  pseudoscopic 
image  of  the  object.  The  reconstruction  of  a  Fourier  transform  hologram 
establishes  the  90°  rotation  identity  of  the  Corollary  6.2  supra. 

A  complex-valued  wavefront  recorded  in  a  planar  optical  hologram  is 
effectively  stored  for  future  reconstruction  by  an  application  of  the  funda¬ 
mental  law  of  optical  holography.  Holographic  interferometry  is  concerned 
■vith  the  formation  and  interpretation  of  the  stationary  quantum  interfer- 
e  .'ice  patterns  which  are  created  when  a  coherent  wavelet,  generated  at  some 
earlier  time  and  stored  in  an  optical  hologram,  is  later  reconstructed  accord¬ 
ing  to  the  holographic  reciprocity  principle,  and  caused  to  interfere  with  a 
phase-related  comparison  wavelet.  It  is  the  storage  or  time  delay  aspect 
which  gives  the  holographic  procedure  a  unique  advantage  over  conven¬ 
tional  optical  interferometry.  It  permits  diffusely  reflecting  or  scattering 
objects  which  are  subjected  to  stress  to  be  interferometrically  compared  with 
their  non-deformed  state.  Actually,  the  holographic  interferometry  has  be¬ 
come  one  of  the  most  important  applications  of  the  fundamental  law  of 
optical  holography. 

After  the  quantum  of  energy  has  already  gone  through  the  double  slit 
screen,  a  last-instant  free  choice  on  our  part  gives  at  will  a  double-slit 
interference  record  or  a  one-slit-beam  count.  Does  this  result  mean 
that  present  choice  influences  past  dynamics,  in  contravention  of  every 
formulation  of  causality?  Or  does  it  mean,  calculate  pedantically  and 
don't  ask  questions?  Neither;  the  lesson  presents  itself  rather  as  this, 
that  the  past  has  no  existence  except  as  it  is  recorded  in  the  present. 

John  A.  Wheeler  (1978) 


I  argue  that  the  very  structure  of  all  quantum  theories  suggests  a  revision 
of  the  classical  notion  of  space  and  time.  I  will  present  evidence  tb  ' 
two  copies  of  space-time,  rather  than  one,  are  the  proper  arena  for  I 
quantum  processes.  At  the  heart  of  this  observation  lies  the  very  well 
knozvnfact  that  every  set  of  equations  and  formulae  in  quantum  theory, 
from  which  all  the  transition  amplitudes  are  determined,  may  always 
be  written  in  two  equivalent  forms,  differing  by  complex  conjugation. 
We  obtain  one  set  from  the  other  by  reversing  the  sign  of  the  imaginary 
unit  i. 


Iwo  Bialynicki-Birula  (1986) 
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8.  Optical  wavefront  conjugation 

The  pair  of  reproducing  diffraction  integrals  of  the  Corollary  to  Theorem  7.1 
supra  describing  the  holographic  filter  bank  is  also  at  the  basis  of  optical 
wavefront  conjugation  by  means  of  real-time  quantum  holography  [49,  50] 
in  electro-optical  PRCs.  Two  of  the  beams  are  referred  to  as  pump  beams 
and  are  arranged  such  that  they  are  co-linear  and  counter-propagating  and 
overlap  both  spatially  and  temporally  in  the  symplectic  hologram  plane 
The  third  beam,  commonly  called  the  probe  beam,  can  interfere 
with  each  of  the  pump  beams  to  generate  transient  phase  gratings  within 
the  electro-optical  PRC.  These  gratings  arise  because  the  refractive  index  of 
the  PRC  changes  in  response  to  the  intensity  of  laser  light:  as  the  pump 
beams  interfere  with  each  other,  the  regions  of  constructive  and  destructive 
interference  cause  a  corresponding  modulation  of  the  refractive  index.  The 
pump  beams  then  entering  the  PRC  can  be  deflected  by  the  induced  gratings 
to  produce  the  fourth,  wave-front  conjugate  beam.  The  wavefront  conjugate 
beam  propagates  back  along  the  path  of  the  probe  beam  with  a  wave  vector 
opposite  to  the  wave  vector  of  the  probe  beam. 

Recall  that  neurocompufers  consist  of  weighted  linear  interconnections 
between  arrays  of  simple  nonlinear  processing  units,  the  neurons.  Informa¬ 
tion  is  stored  in  the  neurocomputer  almost  exclusively  in  the  interconnection 
pattern.  Learning  dynamics  are  used  to  evolve  the  interconnection  strength 
pattern  as  a  succession  of  small  perturbations.  Because  degenerate  four- 
wavelet-mixing  wavefront  conjugate  mirrors,  as  described  above,  provide 
retroreflection  and  optical  tracking  novelty  filters.  Theorem  7. 1  is  at  the  basis 
of  neural  network  models  implemented  by  local  neural  networks  of  reconfig- 
urable  holographic  optical  interconnect  patterns  in  optical  neurocomputer 
architectures  [2,  3,  4,  5,  7,  79,  81, 82, 83,  124,  126, 127,  129,  128,  130,  131 ,  142. 
145, 144,  143,  146).  In  the  long  term,  real-time  holography  in  PRCs  appears 
to  be  the  most  appealing  reconfigurable  optical  iptprconncction  technique 
If  the  holographic  associative  memory  has  net  g.i.n  comparable  with  the 
losses  in  the  resonator  cavity,  the  output  will  converge  to  a  real  image  of  the 
globally  stored  object:  the  expanded  conjugate  reference  signal  beam  acts 
as  an  optical  scanner  for  readout  of  the  associate  information.  In  case  of  a 
linear  resonator  memory,  gain  is  supplied  by  the  wavefront  conjugate  mirror 
which  provides  regenerative  feedback,  whereas  in  case  of  a  loop  resonator 
memory,  gain  is  supplied  by  an  externally  pumped  electro-optical  PRC 

The  Softer  optical  resonator  forms  an  implementation  of  an  optica! 
neurocomputer  architecture  which  includes  two  degenerate  four-wavelet¬ 
mixing  wave-front  conjugate  mirrors.  F'or  more  details,  the  reader  is  referred 
to  Section  15  infra. 
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Geometric  quantization  provides  the  structure  for  the  geometric  realiza¬ 
tions  of  the  irreducible  unitary  representations  of  the  groups  involved 
in  physics. 


Norman  E.  Hurt  (1983) 


9.  Radial  isotropy 

The  vast  majority  of  optical  systems  are  designed  to  operate  over  a  field  of 
view  that  is  radially  isotropic.  If  the  processing  elements  of  an  optical  system 
are  constrained  to  be  radially  symmetric,  it  is  only  necessary  to  optimize  the 
performance  over  a  radial  slice  of  the  field  of  view.  The  system  is  then 
guaranteed  to  have  the  same  performance  over  any  radial  slice  of  the  field 
of  view.  The  advantages  to  optimizing  over  a  radial  slice  as  compared  with 
the  full  field  of  view  are  speed  and  cost.  Each  additional  field  point  used 
in  the  automatic  design  routine  such  as  CODE  V  increases  the  computation 
time,  and,  therefore,  the  expense  (67). 

A  complex-valued  writing  wavelet  packet  amplitude  density  i|’l  t  )dt  in 
Si  ;H  I  is  called  radially  isotropic  if  its  holographic  trace  transform  H  i  ( \J’;  x ,  i) ; 
dx  A  dy  is  a  radial  differential  2-form  on  the  synrpleclic  hologram  plane 
'IR  iH,  i.e.,  if  Hi  111",  x,  y )  dx  A  dy  is  invariant  under  the  natural  action  ol  the 
orthogonal  group  0(2, '.Ki  in '.^1  ’.K 

Theorem  9.1.  The  complex-wrlued  wavelet  packet  amplitude  density  ijM  t  idt 
on  j-,  radially  isotropic  if  and  only  if  if'  cS|9H  admits  the  form  of  1  lermite- 
Gaussian  eigenmodes 

iji  c„H„ 

where  c„  ■  ff  is  a  constant  and  H,i(tl  e“''  ■^h,,  (t )  is  the  Hermite  function 
of  degree  n  •  0. 

The  proof  follows  by  Kirillov  quantization:  There  is  a  complete  clas- 
sitication  of  the  irreducible  unitary  linear  representations  of  the  diamond 
solvable  l.ie  group  T  •  G  having  Uv  as  their  restrictions  to  G.  The  Kirillox- 
corresponding  coadjoint  orbits  in  the  dual  of  the  non-exponential  diamond 
l  ie  algebra  are  parabaloids  of  revolution  I  l.'i4J. 

It  follows  from  the  preceding  theorem  that  a  quantum  mechanical  har¬ 
monic  oscillator  is  equivalent  to  an  assembly  of  bosons  each  having  one 
polarization  state.  Notice  that  the  Hermite-Gaussian  eigenmodes  (H,,!,,  o 
are  crucial  for  the  phenomenon  of  daydreaming  in  optical  resonator  neuro¬ 
computers  (.9,  162). 

Corollary  9.2.  The  elementary  holograms  (Hi(H,n.H„: ,o.n  o  form  a 
Hilbert  basis  of  the  complex  Hilbert  space  L^(91  :  91). 
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The  orthogonality  of  the  elementary  holograms 
(H,(H  !Tt  s  H  ,  . } }  in  ^  0 ,  n  -  0 

in  the  complex  Hilbert  space  L‘('Ti  :  tH)  implies  that  in  the  Shannon  sense 
the  mutual  informaiion  of  the  code  coefficients  is  zero.  Thus  the  code 
coefficients  are  non-redundant,  they  form  a  statistically  independent  en¬ 
semble,  and  image  coding  in  terms  of  the  decorrelating  family  of  code 
primitives  (Hi  (H,n ,  Hn; -Dm -o.n  is  optimally  efficient.  Let  the  element 
t  L'lfH  •  IK)  admit  the  expansion 

v|>  =  ^  c,„,„H|(H,n,H„;.,.l 

tn  .;0,H  ..*0 

with  complex  coefficients  given  by 

Cm.n  =  iV]'  H|  (Hin.Hi,  (  fu  0,  U  ->  0) 

It  follows 

^  XL 

rn  •  0 ,  n  -0 

and  by  switching  to  the  time-asymmetricsfafe-vector  reduction  procedure  of 
quantics,  the  prtibability  tiiat  *  0  in  L-(tt\  'if)  collapses  to  the  elementary 
hologram  It  i  (H,,, .  H„, .,  .1  is  given  by  the  ratio 

Ciii.t,  ‘  'M-  1  ("I  >  0. n  .*  0). 

The  non-deterministic  collapse  of  the  wavefunction  vj)  1  -|tK  '.if)  represents 
the  nonlinear  aspect  of  the  Kirilh'V  quantization  procedure  because  it  violates 
the  quantum  complex  superposition  principle.  It  is  complementary  tt)  the 
linear  aspects  of  quantum  holography. 

Corollary  9.3.  The  quantum  mechanical  mode  competition  in  recognizing 
rh  /  Oinl^iflt  • 'iflisdetermined  by  the  probabilitiesdc, „l"/|j\li||2)m  :-o,u  e- 

Aimicrouic^  (s  a  name  coincti  for  layered  structures  of  proces'iiin^  elee- 
tronici^,  binary  mierwplieri,  and  detector  arrays,  loith  applications  in 
inia^in^  systems  with  processing  ri^ht  at  the  focal  plane.  Amacronie 
structures  are  based  on  lessons  that  we  learned  from  Mother  Nature- 
Human  beings  live  quite  happily  with  a  data  transfer  rate  of  a  fexc  kilo¬ 
cycles,  massively  parallel  yes.  but  not  very  fast.  All  ima;;;iny  systems 
suffer  from  the  Von  Neumann  bottleneck  in  electro-optics  (in  computer 
systems  all  the  priKessin^^  functions  yo  through  a  sin^ile  CPU,  the  cen¬ 
tral  firocessor  unit;  it  slows  lUnvn  the  overall  system).  Electro-optical 
systems  are  similar;  all  the  optical  information  goes  through  a  detector 
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array  at  the  focal  plane  which  is  the  bottleneck.  We  are  developing  lay¬ 
ered  structures  of  optics  and  electronics  in  a  parallel  form  (a  processing 
unit  per  pixel),  someiohat  like  what  happens  in  front  of  the  retina  of  your 
eye  tohere  you  have  similar  amacrine  clustered  processing  layers.  The 
zvord  "amacrine"  comes  from  the  Greek  a  macros  meaning  short  range. 
The  idea  is  to  couple  dynamically  clusters  of  detector  arrays.  With  bi¬ 
nary  optics  we  may  be  able  to  build  systems  with  peripheral  vision  much 
more  motion-sensitive  than  on-axis  fovial  view,  or  systems  tuned  for 
edge  detection  or  noise  reduction. 

Wilfrid  B.  Veldkamp  (1989) 

Binary  optics:  The  optics  technology  of  the  1990s. 

Wilfrid  B.  Veldkamp  (1990) 


10.  Amacronics:  the  microoptics  layer 


The  one-dimensional  unitary  linear  representations  of  the  polarized  Heisen¬ 
berg  group  G  are  given  by  the  assignment 

U(f.,n)(x,y,z)4|)(t)  =  (t  €  9t). 

These  representations  which  are,  of  course,  irreducible  are  called  represen¬ 
tations  of  linear  Fraunhofer  type  of  G.  Under  the  Kirillov  correspondence 
they  admit  one-point  coadjoint  orbits  {0(i„,,)  =  ,,)!(£., r|)  €  tfl  T-  iH]  in  the 

"singular"  plane -v  =  0 spanned  by  {P'.Q'l  in  p*  =  T'^,  o  which  form  a 
set  of  Plancherel  measure  zero.  The  Plancherel  measure  nc  of  G  is  uniquely 
determined  by  the  Haar  measure  dx  3  dy  :  ?  dz  and  concentrated  on  tH  —  {0}. 
It  is  given  by 


tig  =  I'vl  dv. 


The  character  formula  of  G  [155]  provides  the  radial  fanin/fanout  distribu¬ 
tion  f)n  the  symplectic  hologram  plane  •  51: 


Trc/cU^dTicf-v') 


The  tempered  distributions  Trc/cU,,  (v  ^  0)  follow  from  the  trace  identity 
for  the  linear  Schrodinger  representation  U)  of  the  Heisenberg  group  G 

Trc/cUi  =  Y. 

n  ^0 

by  time  scaling  t  •->  y/R  t.  Projection  of  the  Kirillov  corresponding 
paraboloids  of  revolution  in  the  dual  of  the  diamond  Lie  algebra  along  the 
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axis  allows  to  create  microscopic  multi-level  surface  relief  patterns  of  high 
quality  diffractive  HOEs  [175, 176];  see  Figure  10.1.  The  focal  length  of  the 
diffractive  microlenses  in  the  HOE  arrays  is  given  by 

f  =  M, 

and  therefore  inversely  proportional  to  the  center  wavelength  The  plane 
■V  =  0  in  g*  forms  therefore  the  focal  plane  layer  of  the  amacrine  structure. 
The  quantization  of  the  continuous  phase  profile  into  discrete  phase  levels 
is  performed  by  using  the  VLSI  ion-etching  technology. 

Both  digital  and  analog  optical  computing  requires  sufficiently  power¬ 
ful  and  bright  sources  of  radiation  characterized  by  a  small  size  and  a  highly 
efficient  transformation  of  pumping  energy  into  coherent  radiation  output. 
Diode  or  injection  lasers  provide  the  best  choice  in  terms  of  power  consump¬ 
tion  and  size.  The  coherence  length  of  their  radiation  output  is  sufficient  for 
optical  computing  purposes. 

In  the  AT&T  Bell  Laboratories'  looped  digital  optical  pipeline  proces¬ 
sor  the  array  of  power  supply  beams  to  read  the  optical  logic  devices  is 
created  from  a  pair  of  laser  beams  by  a  HOE  component.  This  holograting 
of  Dammann  type  [82, 173, 185]  is  a  multi-beam  splitting  DOE  the  pattern  of 
which  is  computed  to  generate  a  uniform  4x8  array  of  wavelets  from  one 
incident  laser  beam.  The  two  850  nm  diode  lasers  can  theiefore  be  used  to 
generate  two  interleaved  4x8  arrays  so  that  one  array  illuminates  all  the 
upper  S-SEED  diodes  and  the  other  illuminates  all  the  lower  ones.  There 
are  two  advantages  in  this  scheme:  two  lasers  can  supply  twice  as  much 
power  as  one  laser,  and,  by  pulsing  one  of  the  pair  of  lasers,  all  of  the  devices 
in  the  array  can  be  preset  into  the  same  logic  state.  The  two  beams  from 
the  laser  pair  are  combined  at  a  "knife-edge."  The  two  counterpropagat- 
ing  beams  are  focused  to  two  adjacent  spots,  one  of  which  is  reflected,  the 
other  transmitted.  This  pair  of  spots  is  imaged  via  the  holograting  onto  the 
device  array. 

Presently  the  most  advanced  implementation  of  two-dimensional 
matrix-addressable  arrays  of  laser  diodes  for  free-space  holographic  opti¬ 
cal  interconnect  patterns  and  photonic  switching  in  optical  computers  is 
formed  by  a  hybrid  optoelectronic  chip  recently  developed  by  the  AT&T 
Bell  Laboratories  in  collaboration  with  Bellcore.  It  includes  more  than 
2  million  electrically  pumped  vertical-cavity  surface-emitting  lasers  (VC- 
SELs)  arranged  on  a  GaAs  substrate  of  size  less  than  1  cm^.  The  active 
area  of  the  micro-lasers  emitting  infrared  laser  light  of  about  A  =  85000317 
wavelength  consists  of  thin  indium  gallium  arsenide  (InGaAs)  layers  sand¬ 
wiched  between  more  than  600  successive  molecular  beam  epitaxial  (MBE) 
or  epi  layers  of  GaAs  and  aluminum  arsenide  AlAs.  Each  of  the  cylindrical 
microlasers  has  cross-section  of  about  5  mm  and  has  been  etched  by  a 
photolithographic  process.  The  lengths  of  the  microlasers  are  about  5.5  tim. 


{  Schempp 


430  } 

and  greater  than  99.5%  of  that  is  passive  material.  All  the  laser  diodes 
are  individually  addressable,  independently  of  the  other  ones  by  a  current 
of  about  1  mA  and  therefore  are  particularly  suitable  for  performing  the 
angle  image  encoding  and  decoding  procedures  of  optical  holograms.  In 
practice,  a  simple  4x4  matrix-addressable  surface-emitting  laser  (SEL)  array 
(MASELA)  was  used  to  address  an  optical  hologram  containing  16  distinct 
images,  each  microlaser  reading  out  a  separate  image.  The  two-dimensional 
MASELA  is  a  technology  to  which  conventional  edge-emitting  diode  lasers 
have  no  practical  counterpart.  It  is  the  aim  of  the  present  development 
in  amacronic  sensor  technology- to  integrate  the  optical  source  chip  into  a 
hybrid  VLSI  neurochip. 

The  Heisenberg  group  is  a  natural  setting  for  defining  and  analyzing 
certain  continuous  and  discrete  concepts  arising  from  the  Fourier  trans- 
fonn  and  associated  with  nonstationary  image  representation. 

Richard  Tolimieri  (1990) 


Now  it  is  together,  blinking  happily. 

Alan  Huang  (1990) 

Fractals  or  fractal  objects  are  self-similar  structures  or  scale-invariant. 
It  can  be  understood  as  a  form  of  symmetry. 

Barry  R.  Masters  (1990) 


11.  Hololattices  and  holofractals 

The  implementation  of  two-dimensional  pixel  arrays  by  holographic  opti¬ 
cal  interconnect  patterns  (79,  81,  82,  83,  185]  and  analog  VLSI  wavefront 
arrays  [1]  suggests  to  look  at  the  restrictions  of  the  sesquilinear  holographic 
transform 

i|)(t')dt'  0  <p(t)dt  •->  Hi (vp,<p;x,y)  •  dx  Ady 

to  lattices  located  inside  the  symplectic  hologram  plane  91  ti-  91  [23,  157]. 
The  quadratic  lattice  Z  2.  embedded  in  the  symplectic  hologram  plane 
91  ;  91  may  be  considered  as  the  projection  onto  G/Cc  of  the  3-cubic  (uni¬ 
form)  lattice 

Lo;={(y,y',C)|ye  Z,  y'eZ.  CeZ} 
and  the  normal  subgroup 

L:=Z.!-Z-i-CG 
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inside  the  three-dimensional  Heisenberg  nilpotent  Lie  group  G  along  its 
center  Cg-  Form  the  compact  Heisenberg  nilmanifold  Lo\G  associated  to 
G  which  is  a  circle  bundle  over  the  two-dimensional  compact  torus  X^.  An 
application  of  the  Weil-Zak  isomorphism 

w,  ((x,y,z)  ^  -  x))  (il^  €  S(9t)) 

VXIcZ* 

allows  to  realize  the  linear  Schrodinger  representation  Li i  of  G  as  the  linear 
lattice  representation 

6i  =  Indplxi ) 

of  G  [59, 1 52, 1 53, 1 55] .  Thus,  6 1  is  a  representation  of  G  of  linear  Schrodinger 
type  which  reveals  to  be  of  extreme  importance  in  quantum  holography. 

Remark  11.1.  The  projection  of  the  Weil-Zak  isomorphism  wi  onto  the  first 
coordinate  axis  gives  rise  to  the  periodization  map 

p  ;  tjj  ^  (x  W|  (4))(x, 0,0)1  (4)  €  S(fH)). 

Combined  with  the  one-dimensional  Fourier  transform  T  =  crfcioL  the  pe¬ 
riodization  map  p  gives  rise  to  the  Poisson  summation  formula  for  the  ele¬ 
ments  of  the  space  S(fR)  and  therefore  to  the  Whittaker-Shannon-Nyquist- 
Kotel'nikov  sampling  theorem  which  allows  the  reconstruction  of  a  band- 
limited  signal  from  its  uniformly  distributed  samples  utilizing  translates  of 
the  cardinal  sine  mother  wavelet  (cf.  Example  6.4  supra).  Interleaving  in 
the  Cross  Interleave  Reed-Solomon  Code  (CIRC)  is  employed  to  redistribute 
data  symbols  in  the  bit  stream  prior  to  recording  so  that  consecutive  words 
are  never  adjacent.  Recording  in  a  non-localized  way  guards  against  the  very 
likely  occurrence  of  burst  errors.  Upon  dc-interleaving  during  the  CIRC  de¬ 
coding  procedure,  the  shuffled  words  are  placed  back  in  their  original  and 
rightful  position  in  the  stream,  and  the  errors  are  distributed  in  time. 

Remark  11.2.  For  the  affine  Lie  group  G  i  of  the  real  line  fH  (see  Remark  7.2 
supra),  however,  there  exists  no  analog  of  the  linear  lattice  representation  6| 
of  G.  Therefore,  there  is  no  summation  formula  of  Poisson  type  for  G  , . 

From  the  isomorphy  performed  by  W|  between  the  linear  Schrodinger 
representation  Ui  and  the  linear  lattice  representation  6i  of  G  follows  the 
identity 

Hi  (4^,  (p; x,y )  ■  dx  A  dy  =  (6i  (x,y,0)wi  (xl'llwi  (ip))  ■  dx  A  dy 
inside  the  pixel  1  -  V2,  -t-  V2I  x]  -  'h,  -f  V2I  in  the  symplectic  hologram  plane 
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As  a  first  application,  the  preceding  identity  allows  to  decide  in  a  math¬ 
ematically  rigorous  way  the  Bohr-Einstein  dialogue  [190, 191, 192, 194, 193, 
195, 177]  in  favor  of  quantum  theory.  Thus  the  linear  lattice  representation 
6i  of  G  allows  to  overcome  the  inadequacy  of  the  classical  Heisenberg  Un¬ 
certainty  Principle  in  describing  by  standard  root-mean-square  deviations 
the  beam-splitter  quantum  interference  experiment  and  its  application  to  the 
holographic  image  encoding  procedure.  In  fact,  Bohr's  intuitive  argument 
cannot  be  rigorously  based  on  any  of  the  known  uncertainty  relations.  The 
comprehensive  structure  of  the  Heisenberg  nilpotent  Lie  group  G,  however, 
is  ideally  suited  for  the  purposes  of  quantum  holography;  An  application  of 
the  preceding  identity  establishes  the  quantum  parallelism  in  a  mathemati¬ 
cally  conclusive  way.  In  fact  it  proves. 

Theorem  11.3.  The  holographic  image  encoding  procedure  implemented  by 
a  linear  Mach-Zehnder  interferometer  generates  an  optical  hologram  if  and 
only  if  quantum  parallelism  holds  between  the  reference  and  object  wavelets. 

Quantum  parallelism,  according  to  which  any  two  states  of  correlated 
photons  must  be  considered  as  taking  place  simultaneously  in  quantum 
complex  linear  superposition,  irrespective  of  how  far  from  one  another  they 
might  be,  is  a  consequence  of  the  Einstein-Podolsky-Rosen  (EPR)  type  non¬ 
locality  of  quantics.  It  has  been  verified  by  sophisticated  and  highly  accurate 
laser  experiments,  the  results  of  all  of  which  are  in  excellent  agreement 
with  the  quantum  theoretical  predictioi«  [8,  10,  9,  11,  12].  The  quantum 
interference  pattern  has  been  observed  even  when  the  time  interval  between 
the  arrivals  of  individual  photons  was  around  30, 000  times  longer  than  the 
time  for  an  individual  photon  to  pass  through  the  linear  Mach-Zehnder 
interferometer  [135,  139].  Therefore,  in  the  context  of  quantum  holography 
the  parallelism  of  firing  neurons  (cf.  [123])  located  at  different  columns  of  the 
visual  cortex,  which  has  been  recently  observed  at  the  Max  Planck  Institute 
for  Brain  Research  in  Frankfurt  am  Main,  is  highly  remarkable. 

A  second  application  is  the  computation  of  hologratings  of  Dammann 
type  [82,  185]  which  are  based  on  planar  fabrication  techniques,  such  as 
photolithography  and  reactive  ion  etching  [189],  now  standard  in  VLSI  elec¬ 
tronic  technology.  They  act  as  multi-beam  splitting  DOE  components  in  the 
AT&T  Bell  Laboratories'  looped  digital  optical  pipeline  processor. 

A  third  important  consequence  is  the  Parseval-Plancherel  type  pixel 
identity 

^Hi(i|);n,n')Hi((p;p,p')  =  Y_  IHi(vl^.(p;n,  p')l^ 

I  M.M' )€?.+  ?.  (M.H' )€&<}<£ 


which  holds  for  all  complex-valued  writing  wavelet  packet  amplitude  densi¬ 
ties  il)(t)dt,  (p(t)dtin  the  space  8(91).  If  the  Hermite-Gaussian  eigenfunctions 
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Hm  and  Hn  (m  ^  n  ^  0)  are  inserted  for  \|)  and  <p,  respectively,  the  radial 
symmetry  of  the  terms  of  the  left-hand  side  implies  by  a  trace  argument  the 
following  result  [156, 157, 150]. 


Figure  11.1 

Theorem  11.4.  The  non-oriented  lattices  of  two-dimensional  pixel  arrays 
in  the  symplectic  hologram  plane  91  'h  91  have  the  crystallographic  dihedral 
groups  Dv;(lc  6  {1,2, 3, 4, 6})  of  order  2k  as  their  groups  of  symmetry. 

An  application  of  the  representations  of  linear  Fraunhofer  type  of  G 
yields  the  following 

Corollary  11.5.  The  diffraction  patterns  of  the  non-oriented  lattices  located 
in  the  hologram  plane  are  the  reciprocal  lattices. 

Snowflake  fractals,  i.e.,  self-similar  planar  von  Koch  curves  admit¬ 
ting  locally  the  symmetry  groups  Dk  (k  e  11,2,3,4,6})  are  called  holo¬ 
graphic  fractals  or  holofractals,  for  short.  The  validity  of  the  preceding 
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Figure  11.2 


results  can  be  experimentally  demonstrated  by  means  of  optical  holofrac- 
tals  readout  by  the  representations  of  G  of  linear  Fraunhofer  type  [150,  183, 
182).  For  the  case  k  =  3  of  triadic  holofractals  having  Hausforff  dimension 
log  4/  log  3  =  1 .2619 . . . ,  a  line  segment  of  unit  length  serves  as  the  initiator 
and  an  equilateral  triangle  becomes  the  generator;  see  Figures  11.1, 11.2,  and 
11.3.  Readout  of  randomly  distorted  holofractals  by  the  representations  of 
G  of  linear  Fraunhofer  type  generates  radial  speckle  patterns  [171, 172]. 

An  application  of  the  Weil-Zak  isomorphism  wi  to  the  readout  for¬ 
mulae  of  the  Corollary  to  Theorem  7.1  supra  shows  that  the  scanout  of  the 
two-dimensional  pixel  arrays  of  the  holographic  lattices  (or  hololattices,  for 
short)  may  be  performed  by  a  time-multiplexing  procedure. 

Remark  11.6.  It  is  a  highly  remarkable  observation  of  neurophysiology 
that  the  presynaptic  vesicular  grids  of  the  mammalian  brain  are  hexago¬ 
nal  hololattices.  The  thickness  of  the  presynaptic  membrane  by  which  the 
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Figure  11.3 


synaptic  vesicles  emit  their  specific  neurotransmitter  substances  is  about 
500317  whereas  the  uncertainty  of  the  position  of  a  synaptic  vesicle  due  to 
the  Heisenberg  Uncertainty  Principle  is  about  500317  per  millisecond  [42, 
36].  Of  course,  this  observation  and  its  consequences  are  also  interesting 
from  the  philosophical  point  of  view  [43, 197). 

Remark  11.7.  The  hololattices  are  at  the  basis  of  the  detour  phase  method 
[158,  163]  of  writing  digital  CCHs  of  sampled  images  by  use  of  the  fast 
Fourier  transform  (FFT)  algorithm.  The  height  and  the  displacement  of  a 
single  aperture  centered  at  the  sampling  points  of  the  hololattice  are  used  to 
encode  the  complex-valued  wavelet  packet  amplitude  density  including  the 
phase  of  the  wavefront.  Thus  the  actual  encoding  of  detour  phase  CGHs  is 
performed  without  the  explicit  use  of  a  reference  wavelet.  The  hololattice 
corresponding  to  the  crystallographic  group  De  of  twelvefold  symmetry 
offers  substantial  computational  efficiency  and  a  significant  reduction  of 
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required  data  storage  compared  with  rectangular  sampling:  the  hexagonal 
FFT  is  25%  more  efficient  than  the  most  efficient  rectangular  FFT  algorithm. 
The  scanout  of  the  wavefront  is  achieved  when  the  CGH  is  illuminated  with 
a  plane  wave  and  focused  with  a  Fourier-transforming  lens. 

Remark  11.8.  Compact  disks  (CDs)  may  be  regarded  as  one-dimensional 
digital  CGHs  that  may  be  scanned  out  by  the  holographic  optical  read-head 
of  a  CD  digital  audio  player.  The  scanning  laser  beam  which  is  focused 
on  the  surface  of  the  CD  is  focused  on  its  return  path  on  a  quadrant  de¬ 
tector  located  near  the  laser  diode  chip.  The  detector  converts  the  arrays 
of  minute  optical  holograms  which  are  coherently  encoded  by  mixing  the 
scanning  beam  with  the  beam  scattered  by  the  pits  into  a  sequence  of  elec¬ 
tric  pulses.  Thus  the  massive  amount  of  information  arising  by  scanning 
the  simple  interference  patterns  of  pits  and  lands  has  to  be  serially  pumped 
off  the  symplectic  hologram  plane  91  0  91  and  then  fed  into  the  bit-stream 
chip  or  the  multi  stage  noise  shaping  (MASH)  IC  of  the  CD  player's  micro¬ 
electronic  circuitry.  It  is  the  focal  plane  of  the  collimating  lens  which  forms 
the  optoelectronic  von  Neumann  bottleneck  of  the  hybrid  device.  Erasable 
magneto-optic  technology  uses  laser  light  both  to  record  and  to  read  data. 

A  blank  disk  has  all  its  magnetic  domains  oriented  north-pole-down.  To 
record  information,  a  burst  of  a  few  nanoseconds  of  high-intensity  light 
from  an  infrared  laser  heats  a  spot  about  Iqm  across  in  one  magnetic  layer 
of  the  disk.  The  coercive  force  required  to  change  the  magnetic  orientation 
of  all  the  domains  in  the  spot  from  north-pole-down  to  north-pole-up  falls 
to  almost  zero  as  the  temperature  of  the  spot  increases  to  150°C,  and  the 
bias  magnetic  field  created  by  a  coil  flips  the  magnetic  field.  The  data  are 
read  by  a  lower-powered  beam  from  the  same  laser,  whose  polarization  de¬ 
pends,  by  the  Kerr  magneto-optic  effect  on  whether  the  magnetic  orientation 
of  the  spot  is  north-pole-up  or  north-pole-down.  Optoelectronic  ICs  in  the 
magneto-optic  write-read-head  senses  the  polarization,  and  the  magnetic 
orientation  is  interpreted  as  a  digital  1  or  0.  The  magneto-optic  switching 
technology  suggests  to  consider  the  spin  variables  of  an  erasable  CD  as  a 
one-dimensional  artificial  neural  network. 

The  single  most  important  principle  in  the  analysis  of  electrical  circuits 
is  the  principle  of  linear  superposition.  For  an  arbitrary  netioork  con¬ 
taining  resistors  and  voltage  sources,  we  can  find  the  solution  for  the 
network  (the  voltage  at  every  node  and  the  current  through  every  resis¬ 
tor)  by  the  folicrwing  procedure.  We  find  the  solution  for  the  network 
in  response  to  each  voltage  source  individually,  ivith  all  other  voltage 
sources  reduced  to  zero.  The  full  solution,  including  the  effects  of  all 
voltage  sources,  is  just  thesum  of  the  solutions  for  the  individual  voltage 
sources.  In  addition  to  linearity  of  the  component  characteristics,  there 


Quantum  holography  and  neurocomputer  architectures  } 


{  437 


must  be  a  well-defined  reference  value  for  voltages  (ground),  to  which  all 
node  potentials  revert  when  all  sources  are  reduced  to  zero.  This  prin¬ 
ciple  applies  to  circuits  containing  current  sources  as  well  as  to  those 
containing  voltage  sources.  It  applies  even  if  the  sources  are  functions 
of  time.  It  also  applies  to  circuits  containing  capacitors,  provided  that 
any  initial  charge  on  a  capacitor  is  treated  as  though  it  were  a  voltage 
source  in  series  with  the  capacitor. 

Carver  A.  Mead  (1989) 


12.  Amacronics:  the  processing  electronics  layer 


Recall  from  the  theory  of  electrical  networks  that  a  simple  closed  path  in  the 
plane  is  called  a  mesh.  A  mesh  is  called  hexagonal  if  it  has  the  dihedral 
group  Dg  as  its  symmetry  group  (see  Theorem  11.4  supra).  Let  us  assume 
that  the  processing  electronics  layer  is  implemented  by  a  linear  network  of 
local  resistive  circuits  and  that  the  voltage  is  constant  around  the  perimeter 
of  each  concentric  hexagonal  mesh  about  the  driven  node.  Consider  the  nth 
concentric  hexagonal  mesh  where  all  of  its  6n  nodes  have  the  same  voltage 
Vn.  On  its  perimeter  there  are  6  vertices,  and  the  remaining  6( n-  1 )  nodes  lie 
along  the  edges.  Each  of  the  6  vertex  nodes  makes  3  outside  interconnections, 
while  each  of  the  6(n—  1 )  edge  nodes  makes  2  outside  interconnections.  Thus 
the  nth  hexagonal  mesh  connects  to  the  (n  + 1  )st  concentric  hexagonal  mesh 
through  (12n4-6)  parallel  resistors.  Each  of  the  resistors  has  resistance  R  and 
conductance  ' .  Therefore  the  impedance  connecting  the  nth  mesh  to  the 
(n4- 1  )th  concentric  mesh  is  R/(  12n+6).  Similarly  the  impedance  connecting 
the  nth  mesh  to  the  (n  -  1  )st  concentric  mesh  is  R/(12n  -  6).  Along  the  nth 
mesh  there  are  6n  conductances  to  ground,  making  a  net  admittance  to 
ground  of  6nRo  ' .  According  to  Kirchhoff's  current  law,  the  current  flowing 
into  the  nth  mesh  from  its  neighbours  balances  with  the  current  flowing  out 
of  the  nth  mesh  to  ground.  It  follows  the  forward  recursion 


R/(12n  +  6)  R/(12n-6) 


=  6nR(;,  'Vn 


(n  ^  1). 


Introducing  the  parameter  a  =  RR^,  ’ ,  Kirchhoff's  current  law  takes  the  more 
convenient  form  [51, 115] 


(2n+l)VnM-n(Q  +  4)Vn  +  (2n-1)Vn-.=0  (n  ^  1). 

It  describes  the  voltage  on  a  given  hexagonal  mesh  in  terms  of  the  voltages 
on  the  two  smaller  concentric  hexagonal  meshes.  For  any  number  w  €  €  the 
identity 

22  (w"  (2n  +  I  )V„  M  -  w'’n(4  +  a)Vn  +  w"(2n  -  1  )V„_ , )  =  0 

n  ?  1 
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follows.  For  the  power  series  G(w)  =  ^  V^w'^,  the  relations 

n^I 

w'^'^’Vn+l  =  G(w)  —  wV), 

n  ^  1 

^  w^-Vn-i  =G(w)  +Vo 

n  ^  1 

yield  the  inhomogeneous  linear  differential  equation  of  the  first  order 

2  (w^  -  (2  +  ^)  w  +  l)  ^  +  (^w  -  I)  G  =  V,  -  wVo. 

The  linearity  of  the  ordinary  differential  equations  reflects  the  linearity  of 
the  network.  Decompose  the  quadratic  factor 

-  ^2  +  w  +  1  =  (w  -  r,  )(h'  -  r_) 

into  linear  factors.  Then  the  voltage  of  the  first  hexagonal  mesh  takes  the  form 


V, 


in  terms  of  the  voltage  Vo  at  the  center  of  the  network  and  the  complete 
elliptic  integrals  of  the  first  and  second  kind  E(ri )  and  K(r£  ),  respectively, 
evaluated  at  the  parameter  value  ri.  The  arithmetic-geometric  mean  algo¬ 
rithm  presents  an  efficient  tool  to  compute  Vj  in  terms  of  Vo  and  then  to 
apply  the  forward  recursion  to  compute  all  the  voltages  Vn  (n.  1 ). 

Although  a  cell's  response  function  is  in  general  nonlinear,  visual  neu¬ 
rophysiologists  have  found  that  for  many  cells,  a  linear  summation 
approximation  is  appropriate. 

Ralph  Linsker  (1988) 


13.  Gabor  wavelets  attached  to  a  hololattice 

In  biological  vision,  the  center-surround  receptive  field  profiles  of  the  retinal 
neurons  [35, 37, 38, 39, 41, 1 1 1]  and  the  cells  of  the  lateral  geniculate  nucleus 
are  far  from  forming  an  orthogonal  family  in  L^(91  tj‘  91).  Therefore  the  re¬ 
sulting  neural  representation  remains  highly  correlated.  Theorem  7.1  supra 
suggests  to  implement  a  matching  filter  bank  by  an  adaptive  artificial  neural 
network  model  which  is  based  for  (y.y')  e  91  )•  91  on  the  central  projection 
and  backprojection  G-slice  orbits 


:  (’‘.x')  (U,(x.y,0)  x  U, (x'.y'.0)(Hc  HoIKu.m') 
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((vi,  (.i')  e  2.  :  Z),  of  the  Gaussia"  mode  Ho  ■:  Ho  in  ••  iH).  The  irre- 
ducibility  of  the  linear  Schrddinger  representation  Ui  of  G  combined  with 
the  Weil-Zak  isomorphism  wi  implies: 

Theorem  13.1.  The  approximating  family  of  Gabor  wavelets 

{g;;‘;;;;;i(4,h')  e  Z  -;  Zjy.y'ietR  ;  iH} 

attached  to  thehololattice  2  ;  Zinside  the symplectic hologram  plane TH 
is  total  in  the  complex  Hilbert  space  ■  iH). 

Notice  that  the  Gabor  wavelets  form  a  non-orthogonal  family  in  the 
Hilbert  space  L'^('31  •  tH).  expansions  in  terms  of  Gabor  wavelets  of¬ 
fer  high  code  compression  rates  appropriate  for  image  processing  purposes 
[37],  Early  stages  of  biological  visual  systems  pay  for  keeping  m  -  a  -  0 
by  the  non-orthogonality  of  the  center-surrcund  receptive  field  profiles.  The 
family  of  Gabor  wavelets  give  excellent  fits  in  the  chi-squared  statistical 
sense  to  the  correlating  simple  cell  field  profiles  empirically  studied  in  the 
cat  striate  cortex  [85.  84,  m).  The  retina  [41],  an  outpost  of  the  central 
nervous  system,  and  the  lateral  geniculate  nucleus,  however,  act  as  decorre¬ 
lators  of  the  incoming  signals.  At  the  level  of  the  mammalian  visual  cortex, 
the  introduction  of  orientation  selectivity  through  localized  wave  modula¬ 
tion  combined  with  quadrature  phase  relations  among  paired  cells  results 
in  a  decorrelated  neural  repveseiUalion  with  optimal  image  compression 
performance  by  the  Hilbert  basis  of  ^  t^l)  of  elementary  holograms 

(H|  (H,n ,  H„; .  1 1,„  .I'.ii  0  Signal  preprocessing  and  processing  in  the  au- 
ditorv  parts  of  the  cortex  follow  similar  basic  lines. 

I  sale  mil  first  hologram  at  the  Ontario  Science  Centre  in  Canada,  and 
have  heen  ohscssed  with  holo^raphii  ever  since 

Sunny  Bains  (1^S7) 

the  resolution  can  indeed  he  very  since  the  effective  aperture  for 
thesysiem  is  not  the  aperture  of  the  object-hearing  system  hut  is  instead 
the  avert  lire  of  the  other  branch  of  the  interferometer 

Emmett  N.  Leith  (19H6) 

14.  Optical  display  holograms  and  superresolution  imaging 

Optical  phase  conjugation  by  degenerate  four-waveiet-mixing  requires  a 
coherent  light  source.  Due  to  the  degrading  effects  of  coherent  artifact 
noise  which  results  from  the  quantum  stationary  interference  between  the 
light  scattered  by  imperfections  in  the  optical  path  and  the  unscattered  light 
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representing  the  optical  signal,  a  considerable  amount  of  research  interest 
has  been  directed  towards  the  use  of  partially  coherent  illumination. 

In  case  of  a  spatially  incoherent  light  source,  the  coherent  holographic 
transform  as  defined  above  has  to  be  modified.  The  canonical  differential 
2-form  on  G /C g  has  to  be  replaced  by 


tuv  =  v  ■  dx  A  dy  (v  /  0) 


inherited  from  the  symplectic  form  cuo,  of  the  coad joint  orbit  Ov  in  the 
real  dual  of  the  Heisenberg  Lie  algebra.  For  different  center  frequencies 
-V  ^  0,V  *  0,  the  symplectic  affine  planes  (Ov.  (Ov' ,  tuv' )  are  different. 
Therefore  the  associated  irreducible  unitary  linear  representations  Uv  ,  Uv' 
of  G  are  non-isomorphic.  Consequently  the  orthogonality  relations 

(Hv(il),(p;.,.)|Hv.(i|'',(p';...))  =0  (v^V) 

hold  on  the  symplectic  plane  iH  [  9f  for  in  S(':K].  Instead  of  the 

coherent  holographic  transform,  the  transform  defined  by  quantum  complex 
linear  superposition  of  the  mapping 

vi)(t']dt'  (p(t  )dt  I-)  H(v(>,  (p;x,y )  ■  dx  A  dy 


has  to  be  considered.  The  form  coefficient 


H(4).(p;x,y) 


V  ■  Hv(4>,<p;x,y  )dv 


is  performed  by  integrating  over  the  spatial  frequencies  v  emitted  by  the 
light  source. 


Figure  14.1 
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As  a  first  application  of  the  preceding  identity,  the  various  kinds  of 
optical  display  holograms  should  be  mentioned.  A  combination  of  the  dis¬ 
cretized  frequency  scale  ("v  scale  dual  to  the  z  coordinate  axis)  with  the  Bragg 
frequency  selection  law  for  the  generated  diffractive  planar  multilayers  ex¬ 
plains  the  volume  holograms  la  Denisyuk),  the  rainbow  holograms 
la  Benton),  the  multiplex  holograms  la  Cross),  and  the  waveguide  holo¬ 
grams  la  Caulfield);  see  Figure  14.1  for  the  cone  of  Bragg  angles  associated 
to  the  multilayered  dual  manifold  of  G.  The  holographic  angle  decoding 
procedure  can  be  restated  in  the  following  form: 

Theorem  14.1.  The  choice  of  the  hologram  plane  as  a  Kirillov  orbit  -v  ^  0 
within  the  multilayered  unitary  dual  manifold  of  the  Heisenberg  nilpotent 
Lie  group  G  is  performed  by  the  Bragg  frequency  selection  law. 

Corollary  14.2.  The  coordinates  of  a  page  oriented  holographic  memory 
are  given  by  the  coordinates  (x,y )  €  91  <!'  91  of  the  hologram  plane  and  the 
reference  beam  angle  to  dx  A  dy. 

Since  the  slit  device  in  processing  Benton  holograms  defines  the  axis 
directions  of  the  coordinate  system  in  the  hologram  plane  91  i  '91,  a  movement 
of  the  illuminating  white  light  source  changes  the  rainbow  colours  of  the 
optical  display  hologram. 

One  of  the  most  common  defects  in  optical  display  holograms  is  blur¬ 
ring  of  the  image.  It  is  important  to  appreciate  the  fundamental  difference 
between  optical  holography  and  photography  in  this  respect.  A  photograph 
as  a  two-dimensional  recording  of  an  image  formed  by  a  lens  can  be  blurred 
from  the  start;  the  sharpness  of  the  image  is  not  affected  by  the  light  source 
used  to  illuminate  it.  The  situation,  however,  is  completely  reversed  for  an 
optical  hologram,  which  is  a  recording  of  a  stationary  quantum  interference 
pattern,  not  an  image.  If  the  interference  pattern  is  blurred  at  recording, 
only  the  brightness  of  the  replay  is  affected,  not  the  sharpness  of  the  image. 
The  sharpness  of  the  image  depends  on  the  direction  of  the  light  wavelet 
packet  amplitude  densities  diffracted  by  the  hologram,  which  is  determined 
by  the  spatial  frequency  of  the  recorded  interference  pattern,  and  also  by  the 
direction,  size  and  wavelength  of  the  readout  source.  In  fact  it  is  not  possible 
to  record  an  optical  hologram  of  a  blurred  object.  If  any  optical  hologram 
is  illuminated  with  an  ideal  light  source,  i.e.,  a  point  source  at  the  correct 
wavelength,  angle  and  distance,  then  the  image  will  be  pin  sharp,  no  matter 
how  it  was  recorded. 

The  conclusion  that  if  an  optical  hologram  is  read  out  with  a  point 
monochromatic  source,  then  the  image  would  not  be  blurred  at  all,  is  only 
true  if  the  observer  does  not  look  beyond  the  resolution  of  the  human  eye. 
The  eye  has  only  a  very  small  aperture  and  intercepts  only  a  very  small  cone 
of  rays  from  the  hologram  at  any  time.  If  the  image  in  an  optical  hologram 
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is  inspected  with  larger  aperture  optics,  or  if  a  real  image  is  projected  on  a 
screen,  the  observer  will  start  to  see  blurring  due  to  the  geometric  distortion 
of  the  light  wavelets  emanating  from  each  image  point.  This  arises  if  the 
replay  wavelength  is  different  from  the  recording  wavelength  which  for  a 
white-light  reflection  hologram  means  any  change  in  layer  thickness  or  re¬ 
fractive  index,  or  if  the  replay  angle  is  not  exactly  equal  to  the  recording  angle, 
or  if  a  real-image  hologram  is  replayed  with  an  inexact  conjugate  beam. 

Another  application  of  the  preceding  identity  is  the  superresolution 
imaging  technique  [30,  31,  97,  98,  99,  100].  The  superresolution  effect  is 
achieved  by  incoherent  to  coherent  conversion.  In  this  procedure  the  aper¬ 
ture  of  the  imaging  system  is  reduced  and  the  aperture  reduction  is  compen¬ 
sated  by  reducing  the  spatial  coherence  of  the  light  source.  In  view  of  the 
quantum  parallelism,  a  reduced  aperture  like  a  pinhole  spatial  filter  can  be 
inserted  also  in  the  reference  wavelet  channel  of  the  interferometer  without 
limiting  the  effective  resolution  of  the  two-parallel-channel  optical  imaging 
system.  In  the  readout  step,  the  stationary  quantum  interference  pattern 
generated  in  the  symplectic  hologram  plane  91  ;  'JH  can  be  decoded  as  an 
optical  hologram  by  using  an  expanded  laser  beam. 

Neural  netzeork  inodeh  offer  a  data-driven  uiiftipervifed  computational 
approach  zz’hich  h  complementary  to  the  al^orithm-drmm  approaches  of 
traditional  information  processiii};;  and  artificial  infelli^^ence.  The  fine 
granularity,  masswe  interconnect ivity,  and  hi^h  deforce  of  parallelism 
set  neural  netzoork  nuKlels  apart  from  traditional  electronic  serial  com¬ 
puting.  These  same  features  are  the  hallmarks  of  optical  computing 
architectures  zchich  hazv  led  many  zoorkers  to  consider  optical  itnple- 
mentations  of  neural  netuvrk  models. 

Bernard  H.  Soffer  (1988) 

The  resonator  memory  and  noz’elty  filter  must  be  considered  as  proto¬ 
types,  not  merely  because  they  are  rather  primitive  by  neural  neizvork 
model  standards  but  also  because  their  relationship  to  any  existing  neu¬ 
ral  model  has  yet  to  be  properly  established;  in  sez’eral  zvays,  the  relation¬ 
ship  is  a  distant  one,  at  best.  Many  of  the  features  of  these  dez’ices  are 
nevertheless  strikingly  reminiscent  of  neural  models.  In  the  resonator 
memory,  for  example,  it  is  appropriate  to  use  the  term  "competition "  as 
it  is  used  in  some  neural  models 

Dana  Z.  Anderson  and 
Marie  C.  Erie  (1987) 


15.  The  Soffer  optical  resonator 

In  order  to  identify  explicitly  the  terms  of  the  Parseval-Plancherel  type  pixel 
identity  indicated  above,  we  denote  by  K,„,n  the  complete  bichromatic  graph 
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of  m+n  vertices.  Define  c(K,n,n>0)  :=  land  let  c(Km,n>l)  denote  the  number 
ofchoicesof  I  ^  1  disjoint  edges  in  Km,  n  each  linking  two  vertices  of  different 
colours.  Then 

4>m,n(X):=  Y. 

0<UI(m  +  nl/2| 

denotes  the  matching  polynomial  [47,  60,  77,  162]  of  variable  X  associated 
to  the  bipartite  graph  Km.n-  For  any  number  w  e  the  radial  evaluation  of 
<t>m,n(X)  at  w  is  defined  by  the  rule 

4>Tn,n(w):=  Y.  (-l)‘c{Km.n,l)w'"  +  ’'(ww)“'. 

0^  l$((m  f  n|/2I 


Theorem  15.1.  The  coefficients  of  the  matching  polynomial  (})m,n  (X)  are  the 
elementary  synaptic  strengths  (-1  )‘c(Km.n .  U,  0  ^  I  ^  [(m  +  n)/21,  where 
the  matching  coefficients  c(Km,n.  U  denote  the  number  of  disjoint  synaptic 
interconnects  of  the  local  neural  network  Km.n  (m  ^  n  ^  0)  activated  by  I 
coherently  firing  neurons. 


Example  15.2.  In  the  case  m  =  n  =  3  the  matching  coefficients 

C(K3,3,0)  =  1 
c(K3..,,1)=9 
c(K,,3,2)  =  18 
C(K3,3.3)  =6 

arise.  Thus  the  matching  polynomial  of  the  Thomsen  graph  K^,  ?  is  given  by 
4),3.3(X)=X^-9X‘‘  +  18X^-6. 

Notice  that  the  local  network  K3  3  forms  a  non-planar  graph. 

In  termsof  Laguerre  polynomials  of  order  m  —  n  ^  Oand  degree  n  ^  0, 
it  follows  explicitly  [77, 162) 

4)m,n(X)  =  (-l)"n!X’"-''L;"-"(X^)  (m^n^O). 

By  radial  evaluation  of  the  matching  polynomials  <l)m,n(X)  defined  above, 
the  next  theorem  describes  the  relationship  between  the  elementary  holo¬ 
grams  and  the  matching  polynomials  attenuated  by  the  Gaussian  ( Ho  c- He )  € 
1^(91  'v  93)  with  distance:  the  farther  away  an  input  is  from  a  point  in  the 
neural  network,  the  less  synaptic  strength  it  is  given. 


7" 
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Theorem  15.3.  Let  m  ^  n  ^  0.  Then  the  elementary  holograms  admit  the 
form  of  radially  evaluated  quasipolynomials 

Hi(Hm.Hn;x,y)  =  +  iy)) 

Vmln! 

for  all  pairs  (x,  y )  €  5H. 


Example  15.4.  According  to  the  Corollary  to  Theorem  7.1  supra,  scaled  ver¬ 
sions  of  the  elementary  hologram  Hi  (Ho; =  Ho  0  Ho  can  be  implemented 
as  diffraction  HOEs  (cf.  Remark  4.5  supra)  for  the  fundamental  transverse 
mode  of  a  coherent  laser  light  beam.  This  implementation  is  performed 
with  a  CAD  station  by  projecting  layers  of  constant  optical  thickness  of 
the  rotationally  symmetric  Gaussian  diffraction  profile  onto  the  symplectic 
hologram  plane  91  tf*  91.  In  contrast  to  the  Advanced  Systems  Analysis  Pack¬ 
age  (ASAP)  software  procedure,  however,  the  diffraction  CGH  is  based  on 
a  quantum  holographic  description  of  the  diffraction  profile  and  therefore 
adapted  to  the  purposes  of  amacronic  sensor  technology. 

Corollary  15.5.  By  quantum  complex  linear  superposition,  the  symplectic 
hologram  plane  91  ••  91  can  be  realized  as  a  neural  plane  of  local  neural 
networks. 

In  particular  the  quantum  holographic  approach  to  neural  networks 
yields  the  following  result: 

Theorem  15.6.  The  intrinsic  holographic  interconnection  patterns  are  deter¬ 
mined  by  the  representations  of  linear  Schrodinger  type  whereas  the  extrin¬ 
sic  holographic  interconnects  are  determined  by  the  representations  of  linear 
Fraunhofer  type  of  the  Heisenberg  nilpotent  Lie  group  G. 

Simulations  of  the  synaptic  strength  patterns  by  conventional  large- 
scale  digital  computers  show  that  the  self-organization  of  excited  neural 
networks  results  in  a  cluster  pattern  of  the  neurons  [132];  see  Figure  15.1. 
The  American  artist  Jackson  Pollock  has  been  motivated  by  excited  neural 
networks  to  create  his  drip  paintings  (cf.  Figure  15.2).  Notice  that  the  holo¬ 
graphic  theory  of  associative  memory  also  leads  to  according  prime  place 
to  the  neuroglial  cells  rather  than  modifiable  synaptic  strengths  in  planar 
configurations  of  neurons  [123]. 

Presently  one  of  the  most  successful  implementations  of  the  symplectic 
hologram  plane  91  t?i91  as  a  neural  plane  is  the  Soffer  optical  resonator  built  at 
the  Hughes  Research  Laboratories  [126, 127, 129, 128, 169, 170].  The  optical 
neurocomputer  is  formed  by  a  coherent  optical  resonator  cavity  consisting 
of  an  optical  hologram  placed  between  two  degenerate  four-wavelet-mixing 
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wavefront  conjugate  mirrors  (PCMs).  One  of  the  wavefront  conjugate  mir¬ 
rors  is  sesquilinear  [30,  31,  49,  50]  while  the  other  one  amplifies  higher 
amplitude  density  signals  more  than  lower  amplitude  density  signals  (see 
Theorem  7.1  supra).  The  optical  hologram  has  multiple  "example"  images 
stored  in  it. 

The  neurocomputer  is  configured  so  that  each  example  image  is  holo¬ 
graphically  encoded  using  a  reference  laser  beam  that  impinges  on  the  sym- 
plectic  hologram  plane  91  tit  91  at  a  slightly  different  angle  than  the  reference 
beams  utilized  for  the  other  example  patterns.  After  the  neural  system  has 
been  prepared,  one  can  enter  any  image  into  the  cavity  by  impinging  it  onto 
the  optical  hologram.  The  net  result  is  that  the  holographically  encoded 
image  causes  partial  reconstruction  of  the  reference  beams.  The  complex¬ 
valued  wavelet  packet  amplitude  density  of  each  reconstructed  reference 
beam  is  proportional  to  the  distance  between  the  entered  image  and  the 
example  image  associated  with  the  reference.  As  the  reference  beams  re¬ 
verberate  through  the  cavity  the  strongest  (highest  complex-valued  wavelet 
packet  amplitude  density  in  the  sense)  one  is  incrementally  amplified  and 
the  others  are  incrementally  attenuated  so  that  before  long  only  the  reference 
beam  corresponding  to  the  best  matching  example  is  left.  In  other  terms, 
the  stored  image  with  the  smallest  distance  to  the  input  pattern  survives  in 
the  mode  competition  at  the  expense  of  the  more  distant  images.  At  the 
output  port,  i.e.,  the  reconstructed  real  image  port  of  the  optical  hologram, 
the  best  fitting  example  pattern  then  appears.  Thus  the  optical  neurocom¬ 
puter  functions  as  a  nearest  neighbour  classifier  for  holographic  imagery  by 
recalling  through  a  competitive  memory. 

The  Soffer  optical  resonator  can  be  viewed  as  an  infinite-dimensional 
version  of  the  Hopfield  network.  Or  alternatively,  if  uiie  envisions  the  op¬ 
tical  elements  of  the  neural  system  as  consisting  of  small  discrete  optical 
units,  then  the  Soffer  optical  resonator  can  be  thought  of  as  simply  a  large 
Hopfield  network. 

The  second  generation  of  Soffer  optical  resonators  is  based  on  a  self- 
pumped  wavefront  conjugate  mirror  (SP-PCM)  in  conjunction  with  a  SLM, 
CCD  detector,  frame  grabber,  and  host  computer  [170],  Similar  optical  neu¬ 
rocomputers  have  also  been  recently  built  at  the  Department  of  Electrical 
Engineering  of  Caltech  [145,  144,  143,  146)  and  the  Joint  Institute  for  Lab¬ 
oratory  Astrophysics  (jlLA)  of  the  University  of  Colorado  [2,  3,  4,  5,  7). 
These  neural  systems  have  also  successfully  demonstrated  recording  multi¬ 
ple  patterns  and  functioning  as  a  nearest  neighbor  associative  memory.  The 
day-dreaming  phenomenon  observed  in  a  ring  resonator  memory  reveals 
the  quantum  fluctuation  as  a  consequence  of  the  Heisenberg  Uncertainty 
Principle  (cf.  Section  5  supra). 


{  Schempp 
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Mathematical  models  of  neural  networks  are  having  a  profound  in¬ 
fluence  on  current  research  in  optical  computing.  This  trend  toward 
neural  computing  is  motivated  by  the  sophisticated  control  and  infor¬ 
mation  processing  that  occurs  in  biological  systans.  The  basic  model  for 
a  neural  network  is  a  large  number  of  simple  processing  units  (i.e.,  neu¬ 
rons)  interacting  with  one  another  through  weighted  interconnections 
(i.e.,  synapses).  A  neural  network  is  the  finest  grain  parallel  computer 
possible  where  information  and  program  are  stored  in  the  weighted  in¬ 
terconnections  and  the  processors  perform  simple  thresholding  logic.  It 
is  this  highly  parallel  nature  that  gives  neural  netxoorks  their  computa¬ 
tional  pouter  and  makes  them  attractive  for  optical  implementation. 

H.  John  Caulfield  (1989) 

My  interest  is,  to  paraphrase  a  famous  statement,  not  what  mathematics 
can  do  for  physics,  but  what  physics  can  do  for  mathematics.  That  is 
my  underlying  motive. 

Stanislaw  M.  Ulam  (1986) 


16.  Artificial  neural  network  identities 

Theorem  15.3  supra  implies  the  shift  register  identity  (m  ^  n  ^  0) 

=  +  iu')) 

\/m!n! 

for  all  points  (u.m'I  of  the  quadratic  hololattice  2;  -i  Z.  In  particular,  the 
following  result  obtains: 

Theorem  16.1.  For  m  5  n  5  0  the  identity 

Y_  (-1 1'"  '  (v^(u  +  iu' ))()>„, „(v'n(u  +  iu')) 

(m.m'  leci'C 

holds  for  the  quadratic  hololattice  Z  ;  iZ  of  Gaussian  integers  inside  the 
symplectic  hologram  plane  •  91. 

The  preceding  theorem  gives  rise  to  the  following  special  identities  for 
the  odd  powers  of  TT  in  terms  of  theta-null  values  fi(0, 1 )  =  [155, 154, 

162]  where 
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1:=^: 
m  =  1 ,  n  =  0: 

n  =  - r - r 

4I|i^ 

m  =  2,  n  =  1 : 

_  151  (STt^n'*  -  1)e-^"' 

~  32In6e-"^‘^ 

m  =  3,  n  =  2: 

^  _  45I(167f*M^  -  +21  )£-"■■' 

~  64£^iCe-nM’ 

m  =  4,  n  =  3: 

_7  911  numerator  e“"“‘ 

^  ~  1024Ii,i’‘» 

where 

numerator  =  2567rS'^  -  1584071^*  +  1663207rV-<  -  25245. 


Theorem  16.1  supra  shows  that  the  preceding  identities  for  the  theta- 
null  values  9(0, 1 )  are  of  a  combinatorial  character. 

Remark  16.2.  The  cardinal  sine  mother  wavelet  sine  mentioned  in  Exam¬ 
ple  6.4  supra,  i.e.,  the  univariate  impulse  response  of  the  ideal  lowpass  filter, 
admits  the  Euler  factorization 


sme  X 


n51  '  ' 


W. 


Its  logarithmic  derivative  yields  the  identity 

(1  X'^ 

— 7  +  — 7  + 

ti"* 

I 


X*  \ 


A  comparison  with  the  generating  function  of  the  Bernoulli  polynomials 
Bn(X)  of  degree  n.  5  0 


we 


wX 


=  r-lBn(X)w"  (we€) 

nl 


n  >0 
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evaluated  at  X  =  0  and  w  =  Imx  yields  the  classical  Euler  formulae  for  the 
even  powers  of  tt: 

where  C  denotes  the  Riemann  zeta-function  and  B2n  =  B2n(01  are  the 
Bernoulli  numbers.  In  particular  we  get  the  special  cases: 

n  =  1:  71'^  =  6C(2) 
n  =  2:  =  90t;(4) 

n  =  3:  =  9454(6) 

n.  =  4;  n*  =  94504(8) 

The  first  identity  belongs  to  the  nicest  formulae  established  by  Leon¬ 
hard  Euler.  It  has  been  explicitly  reproduced  in  the  Encyclopedia  Britannica 
(1963). 


Hardly  a  week  fioef  by  ivilhout  an  article  appearing  on  the  front  pa^e  of 
a  national  magazine  or  journal  trumpeting  yet  another  breakthrough  in 
optical  compiuting. 

Lauren  P.  Silvemail  (1990) 

Curability  to  realize  simple  neural  functions  is  strictly  limited  by  our 
understanding  of  their  organizing  principles,  and  not  by  difficultu's  in 
implementation. 

Carver  A.  Mead  (1989) 


17.  Synopsis 

The  computation  of  real  world  phenomena  in  real  time  requires  computa¬ 
tional  power  that  exceeds  by  many  orders  of  magnitude  the  capabilities  of 
sequential  digital  computers  presently  available.  Although  the  data  transfer 
rate  of  biological  neural  networks  is  merely  a  few  kilocycles,  hence  not  very 
fast,  biological  wetware  is  able  to  solve  tasks  such  as  real-time  pattern  recog¬ 
nition  or  sound  localization  because  it  operates  in  analog  mode  which  allows 
simultaneous  summing  of  many  inputs  from  interconnected  units  and  per¬ 
mits  massively  parallel  data  processing  without  the  need  for  iterative  proce¬ 
dures.  Extrapolation  from  simulations  of  simple  neural  circuits  indicate  that 
a  sequential  digital  computer  would  have  to  operate  at  speeds  of  more  than 
10'®  floating  point  operations  per  second  in  order  to  match  the  performance 
limit  of  the  human  brain.  The  implementation  of  artificial  neural  network 
models  based  on  coherent  optical  processors  and  analog  electronic  circuits  of 
neurons  and  synapses  is  currently  being  pursued  in  a  number  of  laboratories 
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where  several  special  purpose  neurocomputer  systems  have  been  fabricated 
in  holographic,  optoelectronic,  or  CMOS  VLSI  electronic  components.  In  the 
quantized  theory  of  the  electromagnetic  field  the  bosons  present  in  a  coher¬ 
ent  light  beam  travelling  in  a  well-defined  direction  are  the  optical  photons. 
The  Kirillov  quantization  approach  to  the  theory  of  the  sesquilinear  holo¬ 
graphic  transform  \l)(t')dt'  ®  <p(t)dt  Hj  (t^,<p;x,y)  ■  dx  A  dy  as  outlined 
in  this  paper  implies  a  link  between  elementary  holograms  and  artificial 
neural  networks.  It  allows  to  rigorously  establish  the  quantum  parallelism 
as  a  EPR-type  phenomenon  (Theorem  11.3  supra)  and  to  recognize  the  sym- 
plectic  hologram  plane  91 0  iH  as  a  neural  plane  (Corollary  to  Theorem  15.3 
supra).  It  is  the  quantum  theoretical  base  of  the  holographic  transform 
vl)(t')dt'  ®  <p(t)dt  >->  H]  (4),  (p;x,y)  ■  dx  A  dy  which  allows  to  model  three- 
dimensional  planar  optics  [81,  89]  by  the  unitary  dual  of  the  Heisenberg 
nilpotent  Lie  group  G  and  to  establish  the  universal  validity  of  the  quan¬ 
tum  holographic  concept  from  amacronic  sensor  technology  to  classical  SAR 
image  processing. 

Wer  spricht  von  Siegen.  Uberstehen  ist  alles. 

Rainer  Maria  Rilke  (1875-1926) 
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a  A  procedure  intrcxjuced  for  incorporating  into  image  pnn'essing 
ikIs  a  priori  3-diniensional  geometrical  information  about  shapes  of  objects 
of  interest.  The  information  is  built  in  bv  way  of  prt>babilitv  measures 
on  deformations  of  a  polyhedral  "template."  In  order  to  understand  the 
regularity  of  the  resulting  deformations,  om-  needs  a  theory  about  the  con 
tinuum  which  consists  of  probability  measures  on  analogous  deformations 
of  a  smooth  compact  2-dimensional  maniftrld  template.  The  theorv  is  con- 
structed  yia  Caussian  measures  on  an  enlargement  of  the  space  of  triplets 
of  exact  one-forms  of  .VI,  such  that  with  probability  1  the  dek'rmations  are 
contimnis  tand  lun  e  additional  regularity).  The  triplets  can  bi‘  \  iewi\l  as  a 
"generali/’ed  differential  mafi " 


1.  Introduction 

One  would  like  to  be  able  to  incorporate  a  priori  3-[)  geometrical  information 
into  current  image  processing  methods.  fTir  example,  can  one  build  into  the 
taking  of  chest  X-rays  information  ccmcerning  the  shape  of  human  lungs  or 
into  angiography  the  shape  of  arteries  so  that  the  various  calculations  which 
are  typically  made,  and  which  are  geometrical  in  nature  (e  g  ,  curxatures. 
diameter  of  certain  cross  sections,  enclosed  volume,  surface  area,  etc  ),  can 
be  automated  and  made  less  subjective’ 

In  [2,3,  I],  the  approach  taken  to  incorporate  geometrical  inlormation 
is  that  of  a  de/onnaWe  template,  where  the  template  is  an  idealized  proto¬ 
type  of  the  object  of  interest,  for  example,  in  the  aboxe,  tlu  -e  would  be  lung 
and  artery  templates.  The  parameter  space  is  viewed  as  having  been  created 
by  the  application  of  deformations  to  the  template,  verv  much  m  the  spirit 
of  having  been  swept  out  by  a  structure  group.  A  prior  probability  measure 

i  Suffpiirti'ii  hy  (  Htu  v  (it  \jv.il  RrstMn  h  i  tuifr.K  t  Xd  KK/r 

4<*‘) 


J  S  Byrnes  ct  ul.  icJa  j.  Prtifhihilnin  u/ui  Situ  hu.un  Mt’fkoits  in  ■\nal\yn  wilh  Applu  ationy. 
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is  then  placed  on  such  a  space  of  deformations  to  describe  the  variations  in 
form.  The  data  is  incorporated  via  a  likelihood  which  describes  the  technol¬ 
ogy  by  which  the  images  were  acquired.  The  goal  is  the  construction  and 
justification  of  an  algorithm  for  generating  realizations  from  the  posterior 
measure.  When  the  template  is  a  surface  (in  £R’)  the  realizations  from  the 
prior  and  posterior  measures  are  "random  surfaces." 

In  the  above  referenced  works  the  templates  were  always  polygons  and 
polyhedra.  In  the  present  paper  a  model  in  the  continuum  is  discussed;  the 
necessity  for  such  a  construction  is  the  result  of  such  practical  considerations 
as  how  to  choose  parameter  values  for  the  prior  measure  in  order  to  obtain 
the  desired  regularity  of  the  deformations. 

What  properties  are  important  for  an  appropriate  general  formulation 
of  deformations  to  a  template?  It  seems  quite  reasonable  to  think  of  there 
first  being  a  global  similarity  or  affine  transformation  of  the  template  which 
captures  the  location,  orientation,  and  overall  size,  followed  by  local  trans¬ 
formations  which  capture  local  structure.  If  M  is  the  template  and  [f  t  TJ 
are  the  deformations,  then  the  parameter  space  is; 

©  =  {f(M).  f€Tl  (1.11 

Ideally,  one  would  like  to  have  the  parameter  space  characterized  once  one 
has  specified  the  template  M  and  several  known  functions  defined  on  the 
teirplate  which  would  describe  the  allowable  local  variability /regularity 
mix;  for  example,  these  functions  might  specify  that  in  certain  regions  of 
the  template  there  will  be  little  variability  (i.e.,  the  template  is  rather  rigid 
there),  whereas  in  other  regions  there  might  be  much  variability.  Also,  it  is 
most  important  that  the  characterization  of  the  deformations  of  the  template 
be  aine.lructive  in  the  sense  that  an  operational  procedure  for  actually  con¬ 
structing  such  deformations  is  available,  as  opposed  to  merely  specifying  an 
equivalence  relationship;  this  is  important  from  an  algorithmic  point  of  view. 

In  Section  2  a  discrete  (polyhedral)  model  is  briefly  described  with 
the  need  for  a  theory  in  the  continuum  being  suggested.  In  Section  ?  a 
continuous  theory  is  outlined,  the  full  details  being  given  in  (5).  In  the 
continuous  model  the  template  is  a  2-dimensional  submanifold  M  of  '  and 
the  above  functions  describing  the  variability/regularity  mix  (by  which  the 
deformations  arc  created)  are  the  cc  ‘'flcienl  functions  of  a  second-order  self- 
adjoint  elliptic  operator  C  on  the  one-forms  of  M.  From  C  is  constructed  an 
operator  1  on  an  enlargement  of  the  space  of  exact  one-forms.  In  this  paper 
L  will  be  the  Laplace-Beltrami  operator  on  one  forms.  Triplets  from  this 
enlarged  space  can  be  thought  of  as  identifying  a  "generalized  differential 
map"  from  which  the  deformations  ar.-’  obtained  by  "integration":  in  order 
to  control  'he  resulting  geometry  of  the  tl-‘armed  lein/'latr  one  must  work  at 
least  at  the  differential  level. 
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2.  Polyhedral  model 

In  the  case  of  a  polyhedral  template  the  resulting  geometry  of  the  deformed 
polyhedral  template  is  more  naturally  controlled  by  deforming  edges  rather 
than  vertices.  Heuristically  speaking,  the  deformations  will  be  constructed 
by  first  constructing  their  "differential  maps."  If  P  is  a  polyhedral  template 
and  {vi , . . . ,  v,n}  and  {ei , . . . ,  Cn}  are,  respectively,  the  vertices  and  edges  of 
P,  let  matrices  (Si,...,Sn)  €  [GL(3;1H)1''  be  "placed"  on  the  n  edges  (Sj 
on  edge  i),  vertex  vj  held  fixed  (for  the  moment).  The  matrices  should  be 
thought  of  as  being  "close"  to  the  identity; 

Si  =  I  +  (Si-I).  (2.1) 

The  S's  are  the  discrete  analogue  of  the  differential  map  and  the  deformation 
f  resulting  from  (Si , . . .  ,Sn)  being  defined  as:  v  €  (vi , . . . ,  Vr^). 

k  k 

f(v)  =  V|  +^$1,61,  =  v+  ^(Si,  -  Dci,  (2.2) 

i-.i 

where  (ci, , . . . ,  e,  J  is  a  path  from  vi  to  v.  In  order  for  this  to  make  sense  f(v) 
must  be  independent  of  the  path  taken  from  vi  to  v.  The  independence  of 
the  path  corresponds  to  the  linear  closure  constraints  imposed  by  the  fixed 
edges  of  the  polyhedral  template.  The  parameter  space  in  the  polyhedral 
case  is: 

On  =  {f(P)  I  f  created  by  (2.2),  (S, . Sn)  €  fGL(3)l". 

with  path  independence)  (2.3) 

If  the  matrices  S,,  i  =  1,  ...,  n  were  ail  the  same  matrix  then  we  would 
be  in  the  setting  of  the  traditional  geometries  (Euclidean,  Similarity,  Affine 
groups);  that  is,  0,,  would  be  the  orbit  of  figures  equivalent  (in  the  geometry) 
to  the  template.  One  only  moves  away  from  these  traditional  definitions 
of  equivalence  by  allowing  the  matrices  to  be  different  on  different  edges. 
Loosely  speaking,  we  will  characterize  shapes  via  invariants  not  ordinarily 
obtained  by  traditional  geometries  by  judiciously  "lumping"  together  orbits 
under  traditional  geometries. 

Consider  the  following  (simple)  Gauss-Markov  measure  defined  on 
(GL(3)1''  via  the  density: 

Pl-G . s„  )  > 

n  I.Si,  -  1)  +  lls,,  -  (2.4) 

'  i  1  .»2  I 

where  the  product  is  over  all  neighboring  edges  in  the  edge  graph  associated 
with  the  polyhedral  template.  There  are  just  two  parameters,  p  and  a.  More 
precisely,  we  have  a  matrix-valued  tiauss-Markov  Random  Field  defined  on 
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the  edge  graph  associated  with  the  polyhedral  template.  On  closer  exami¬ 
nation  the  covariance  structure  of  this  measure  is  seen  to  be  related  to  the 
Green's  function  for  the  discrete  Laplacian  on  the  edge  graph.  One  must 
also  condition  on  the  S's,  when  applied  to  the  fixed  edges  of  the  template,  of 
producing  deformed  edges  which  "come  back  together";  these  are  the  linear 
closure  constraints  which  are  equivalent  to  path  independence.  Condition¬ 
ing  the  measure  in  expression  (2.4)  on  these  linear  constraints,  one  obtains 
another  Gaussian  measure  from  which  one  can  induce  a  measure  on  0^;  call 
it  Un.  The  measure  (.in  is  our  prior  measure  in  the  polyhedral  case. 

In  practice  one  doesn't  use  just  one  polyhedral  template  but  rather  a 
sequence  of  polyhedral  templates,  allowing  one  to  work  at  different  scales; 
this  sequence  can  be  thought  of  as  piecewise-flat  approximations  to  a  smooth 
manifold  template  M.  As  the  polyhedral  template  approximations  are  re¬ 
fined,  the  covariance  structure  in  expression  (2.4)  gets  closer  to  (a  function 
oO  the  inverse  of  the  discrete  Laplacian  (on  the  edge  graph)  which  one 
would  suspect  gets  closer  to  the  Green's  function  for  the  two-dimensional 
Laplacian,  which  because  of  its  logarithmic  singularity  is  only  realizable 
as  a  covariance  function  on  generalized  functions.  In  our  case  the  above 
covariance  structure  is  for  a  measure  on  the  "differential  map"  and  not  on 
the  deformation  itself,  which  would  be  obtained  by  an  "integration."  An 
important  question  is  whether  or  not,  in  the  continuum,  the  analogue  of  ff  1 
in  expression  (2.2)  with  the  analogue  of  the  above  probability  measure  u„ 
produces  continuous  deformations  with  probability  one.  This  question  is 
important  becau.se  if  one  is  to  work  at  different  scales  it  is  crucial  to  know 
whether  chaotic  behaviour  at  a  fine  scale  is  due  to  an  improper  choice  of 
parameters  or  to  an  inadequate  theory  (such  as  a  certain  probability  mea¬ 
sure  only  being  realizable  on  generalized  functions).  In  Section  3  it  is  shown 
that,  in  the  continuum,  the  deformations  are  continuous  (and  more)  with 
probability  one. 

What  should  be  the  analogue  of  the  above  discrete  model  in  the  contin¬ 
uum?  If  one  thinks  of  the  three  rows  of  the  matrices  (Si  - 1),  i  =  1,2 . n,  in 

expression  (2.1)  as  discrete  versions  of  one-forms,  then  in  expression  (2.2 1, 
f(  )  is  created  by  "integrating"  three  one-forms  over  the  path  'Ci,  . . . ,  ) 

from  vi  to  V.  The  independence  of  the  paths,  which  corresponds  to  the 
imposition  of  the  linear  closure  constraints,  is  the  discrete  analogue  of  the 
one-forms  being  exact. 

Consequently,  a  reasonable  generalization  of  (2.11-(2.4)  would  start 
with  a  smooth,  compact,  connected,  oriented  2-dim  submanifold  M  of  (H ' 
as  template  with  B'(M|  being  the  exact  one-forms  of  M.  The  matrices 
(Si , . . . , Sn ),  Si  =  i  +  (Si  -  I),  i  =  1,  . . .,  n,  the  discrete  analogue  of  a 
differential  map,  would  be  replaced  by(l  +  (ai,a2,a?)),tXitB'(M),  I, 

2,  3.  For  (Xi  B'  (M),  \  =  1,  2,  3,  and  po  (fixed)  c  M,  define  fl  )  associated 
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with  (ai ,  a2, as)  as:  p  6  M 


f(p)  =  P  + 


(  ai(Y' 

VJyipo-tp 


{s))ds,  i=  1,2,3 


where  y  is  a  path  from  po  to  p  in  M.  The  parameter  space  0  is  thus: 

0  =  !f(M)  1  f  created  by  (2.5),  aj  e  B’(M),  i  =  1,2,3).  (2.6) 

One  would  first  put  a  Gaussian  measure  on  the  space  of  one-forms  with  the 
covariance  operator  being  the  inverse  of  the  Laplacian  on  one-forms,  this 
is  the  analogue  of  the  measure  in  expression  (2.4).  From  this  one  would 
create  the  conditional  measure,  conditioning  on  the  form  being  exact;  this  is 
the  analogue  of  conditioning  on  the  closure  constraints.  Finally,  using  this 
measure,  one  would  induce  a  measure  on  the  space  0.  Unfortunately,  such 
measures  cannot  be  realized  on  these  spaces  but  only  on  "larger”  spaces. 
In  the  next  section  we  show  that,  probabilistically,  this  is  inconsequential  to 
our  goal. 


3.  Smooth  manifold  template 

Let  (M.g)  be  a  smooth,  2-dim  compact,  connected,  oriented  submanifold 
of  '21'  where  g  is  the  Riemannian  metric  inherited  from  the  dot  product  on 
91';  M  is  our  template.  For  the  results  presented  in  these  proceedings  the 
extrinsic  geometry  of  M  will  not  be  exploited.  Throughout  this  section  all 
structure  will  be  unless  otherwise  specified.  Let  )Uf,4^,  ,  be  a  finite 

atlas  on  M  with  a  subordinate  partition  of  unity  [Uf,  hf  ,.  Let  A*  (M)  be 
the  k-forms  on  M,  k  =  0, 1, 2  and  denote  by  (•,  ■  )[■,'''  the  inner  product: 

(a,  (3)J;  =  a  A  *|3 
Jm 

where  A  and  *  are  the  wedge  and  Hodge-star  operators.  The  usual  1.'  space 
is  A'*''  (M)  completed  with  respect  to  (  ,  Consider  the  family  of  norms 


lUin  0 


on  A'^'(M):  a  A"^'(M) 


pl^'(a)  =  ^ll(}j;,h,all.„,w  (3.1) 

(  I 

where  (Jjj is  the  induced  map  from  A'"(Uf )  to  the  components  of  the  form 
viewed  as  being  defined  on  (t)f(Uf )  and  ||  •  Hm.k  is  the  usual  Sobolev  norm 
|l  •  |!,„  on  D(91^):  g  f  D(91^) 


ID^gl^dx 
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applied  to  the  L(k.)  components  of  the  k  form,  L(k)  =  1  for  k  =  0  or  2  and 
L(  1 )  =2  (see  [7]  or  [6]).  It  is  shown  in  [4]  that  the  family  of  separable  Hilber- 
tian  norms  {pm' determine  the  Schwartz  topology  on  and 

that  the  resulting  countably  Hilbertian  space  is  nuclear. 

Let  d*'^*  and  be  the  exterior  derivative  and  boundary  operator  on 
{A’*'’  (M),  k  =  0, 1,2}  and  define  the  Laplace-Beltrami  operator  as: 

Let  and  be  respectively  the  exact,  co-exact,  and 

harmonic  k-forms.  The  operator  a[^^'  is  invertible  on  B''^'  (4*  (M)  and  the 

inverse,  denoted  bj  can  be  extended  to  A'*^*(M)  (w.r.t.  L^)  by  being  set 
to  zero  on  Let  I'”  be  the  projection  in  L^  to  the  completion  in  L^ 

of  B'(M). 

The  full  details  for  the  following  lemmas  and  theorem  are  given  in  [5]. 
Lemma  3.1.  For  the  operator  L  given  by 

L  =  i'"g'’'I"’ 

there  exists  an  m  ^  3  such  that  the  seminorm  pl  ( • ): 

PL(a)  =  (a,La)^" 

is  the  covariance  functional  for  a  mean  zero  Gaussian  measure  on 
(A"  *  (M),  pm* )'  concentrated  on  (B*'MM),  pm' )',  where '  denotes  dual. 

Remark  3.2.  The  measure  in  Lemma  3.1  is  that  obtained  by  first  constructing 
a  Gaussian  measure  on  (A’  (M),Pm’)'  with  covariance  functional  pcla)  = 
(a,  G* '  ’alp”;  the  existence  of  Pm'  and  the  Gaussian  measure  being  the  result 
of  G'"  being  continuous  w.r.t.  t"'  and  known  existence  results  [4].  The 
operator  L  is  shown  to  be  well-defined  by  using  various  results  from  the 
Hodge-DeRham  Decomposition  theory  and  various  properties  of  G,  6  and  d. 
The  operator  L  comes  about  because  the  condition  (1  -d'^''6'’'G"')|a)  =  0, 
a  e  A'"  (M),  is  equivalent  to  a  G  B"  '(M).  The  seminorm  Pl()  is  shown  to 
be  Hilbert-Schmidt  weaker  than  Pm  *.  In  [5]  the  conditioning  is  made  precise. 
The  measure  in  Lemma  3.1  is  the  "proper"  realization  of  that  suggested  in 
Section  2:  first  obtain  a  Gaussian  measure  whose  covariance  operator  is  the 
inverse  of  the  Laplacian  on  one  forms,  then  condition  on  the  form  being  exact. 

Lemma  3.3.  If  p  =  TM  a  B'  (M)  — >  91  is  given  by 

p(p,Z,,)(a)  =  a,,(Z,,)  a  €  B'(M).  (r.Z,,)  €  TM 

and  if  |y(s),0  $  s  tj  is  a  path  in  M,  then  the  one  parameter  family 
'n(Y(s),Y'(.s)),  0  ^  s  ^  t;  is  continuous  in  (B' (M),  Pm' )'. 
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Remark  3.4.  Because  of  the  nature  of  x],  continuity  in  the  dual  space  norm 
involves  (a  sup  over)  comparisons  of  one  forms  evaluated  at  different  points 
of  M;  everything  has  to  be  parallel  transported  into  the  cotangent  space  at 
one  point  (that  where  continuity  is  to  be  shown,  y(s)).  A  Sobolev  imbedding 
argument  obtains  boundedness  of  derivatives  up  to  m  (m  ^  3)  for  the 
components  in  local  coordinates  of  a,  Pm’(a)  ^  1.  Using  this  boundedness, 
the  continuity  of  the  flow  for  parallel  displacement,  and  the  fact  that  the 
Levi-Civita  connection  is  metric,  the  result  is  obtained. 


Theorem  3.5.  Let  Y,,  i  =  1,2,3  be  independent  Gaussian  continuous  linear 
random  functionals  from  Lemma  3.1,  extended  to  (B'(M),Pm*),  the  com¬ 
pletion  w.  r.t.  pl!>: 

Yi:(B’(M),pi^“)-^L^(0,A,P) 


For  p  €  M,  Y  =  {y(s),0  $  s  $  t]  a  path  from  p„  (fixed)  to  p,  tu  €  O 

f(p,cu)=^+f  Yi(Ti(Y(s).Y'(s))(cu)ds,  i=  1.2,3).  (3.2) 


Then  with  probability  one,  f(p,  to)  does  not  depend  on  the  chosen  path  (for 
all  p)  and  f(',to)  =  M  iH’  is  continuous.  Defining  (Tf)(p,a>)  :  TpM 
TR’as: 


(Tf)(p,to)(Zp)1^'  (f(p,to),Zp  +(Yi(Ti(p.Z„)(to),  i  =  1,2,3)  (3.3) 

then  for  (£,(s),-e  <  s  <  e!,£,(0)  =  p,£.'(0)  =  Zp 

(Tf)(p.')  =  hm -  (3.41 

s-iC  s 

the  limit  being  in  L^(n,  A,  P)  and  w.p.l  (Tf)is  a  linear  map  for  all  p  €  M  and 


Pftulvfpl  dim{Tf(p,  to)(TpM))  2)  =  0]  =  1 
where  v  is  the  volume  element  associated  with  g. 


(3.5) 


Remark  3.6.  We  will  just  give  the  flavor  of  the  proof  of  the  first  part,  ex¬ 
pression  (3.2);  the  other  parts  follow  by  similar  arguments.  Because  of 
Lemma  3.3,  the  Cauchy-Bochner  integral 

[  ri(Y(s).Y'(s))  ds 

lypo-ip 

is  a  well-defined  element  of  (B'  (M),p!n’ )'  and  for  a  €  B’  (M) 


(1 


ri(Y(s),Y'(s))ds  a  = 


Y  Pi'-'P 


a(Y'(s))  ds 
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is  independent  of  the  path  from  po  to  p.  The  integral  in  expression  (3.2) 
is  also  a  well-defined  Cauchy-Bochner  integral  and  the  linearity  of  Y  (be¬ 
tween  Banach  spaces)  results  in  the  independence  of  the  integral  from  the 
path  chosen.  Differences  of  integrals  now  reduce  to  integrals  over  "the 
path-difference  of  paths"  which  can  be  chosen  to  be  a  minimizing  geodesic 
segment.  Developing  various  bounds,  one  can  eventually  apply  (a  form  of 
the)  Kolmogorov  continuous  version  theorem  from  which  the  continuity  of 
f(-,  tu)  follows.  The  other  parts  follow  by  similar  arguments. 

In  the  polyhedral  model,  there  was  a  polyhedral  template  T  and  matri¬ 
ces  (Si , . . . ,  Sn ),  Si  "sitting"  on  edge  i,  where  Si  =  I  +  (Si  —  1  ]  was  designed 
to  be  a  discrete  analogue  of  a  differential  map.  In  the  continuum  the  idea  was 

to  replace  {Si  —  I,  i  =  1 . with  a  triplet  of  exact  one  forms.  Expression 

(3.3)  is  the  "proper"  realization  of  this,  where  (ai,a2,ai)  is  replaced  by 
{(Yi(p(p. Zp  ))(a'),  i  =  l,2,3),p  €  Ml  for  cu  €  D.  Expression  (3.4)  shows 
that  (Tt ),  written  in  suggestive  notation,  acts  (in  an  sense)  much  like  a  dif¬ 
ferential  map.  The  deformations  in  the  polyhedral  model,  expression  (2.2), 
involved  sums;  in  the  continuum,  the  idea  was  to  replace  this  with  integrals 
of  the  triplets  of  exact  one-forms,  which,  again,  is  properly  realized  by  f  in 
expression  (3.2).  The  theorem  shows  that  with  probability  one  the  image  of 
M  under  f  is  continuous.  One  would  like  more  regularity,  at  least  locally. 
It  would  be  impossible  to  rule  out  self-intersections,  globallv;  it  is  hoped 
that  the  data  would  impose  such  consistencies.  Ideally,  one  would  like  the 
images  of  M  under  t  to  be  immersed  submanifolds.  Expression  (3.5)  can  be 
thought  of  as  a  version  of  this.  It  says  that  with  probability  one  the  set 
of  points  p(*=  M )  where  the  image  of  TpM  under  (Tf)  is  not  of  full  rank  has 
volume  measure  zero. 

4.  Summary 

In  this  paper  both  polyhedral  and  smooth  manifold  deformable  template 
models  were  presented.  The  framework  of  Sections  2  and  3  will  allow  one  to 
formulate  and  answer  such  questions,  based  upon  observed  noisy  images,  as 
how  to  construct  good  estimates  of  such  geometrical  entities  as  curvature  and 
the  location  of  its  extrema,  surface  area,  enclosed  volume,  and  the  diameter 
of  a  certain  cro.ss  section  of  the  object  (or  objects)  in  the  images. 

Also  one  should  now  be  able  to  analyze  how  to  choose  and  adapt 
(as  n  ->  oo)  parameters  for  the  sequence  (in  Section  2)  of  measures  m„ 
on  the  spaces  0(T"),  where  T"  is  the  polyhedral  template  at  refinement 
stage  n,  as  n  >  oo,  since  in  Section  3  for  the  smooth  manifold  template 
M  we  constructed  0(M)  and  a  measure  u  (Theorem  3.5)  which  should  be 
the  "limits"  of  0(T")  and  u„.  ALso  it  appears  that  the  Laplace-Bcltrami 
operator  can  be  replaced  by  more  general  self-adjoint  elliptic  operators  on 
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the  space  of  one  forms  with  the  details  going  through;  the  parameters  p  and 
ff  in  expression  (2.4)  are  replaced  by  functions. 

Applications  of  the  methods  of  this  paper  are  currently  underway  to 
such  problems  as  the  estimation  of  the  biparietal  diameter  of  the  fetal  head 
based  upon  multiple  ultrasound  images  (the  biparietal  diameter  is  a  good 
measure  of  fetal  growth)  and  the  estimation  of  volumes  in  medical  imaging. 
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5  We  begin  with  a  historical  survey,  describing  sources  and  content  of  the 
fundamental  papers  of  Paiey  and  Zygnuind  about  random  trigonometric 
and  Taylor  series.  Then,  in  a  discussion  of  the  recent  and  current  research 
in  this  fertile  area,  we  emphasize  local  properties  of  Brownian  motion  and 
some  of  its  applications,  the  present  theory  of  random  trigonometric  and 
Taylor  series  and  some  applications,  Rademacher  series  in  Banach  spaces, 
Sidon  sets,  Riesz  prcxlucts,  the  Pisier  algebra,  and  random  coverings. 


I.  Our  starting  point 

Under  the  cryptic  title  "On  some  series  of  functions,"  Paiey  and  Zygmund 
published  three  papers,  in  1930  and  1932,  in  the  Proceedings  of  the  Cam¬ 
bridge  Philosophical  Society.  They  u'ere  anticipated  by  two  papers  of  Zyg¬ 
mund  on  lacunary  trigonometric  series  in  1930,  and  followed  in  1933  by  a 
short  article  of  Zygmund  on  continuability  of  power  series,  and  a  common 
work  of  Paiey,  Wiener  and  Zygmund,  "Notes  on  random  functions"  (46,  47, 
60,  42,  44,  65,  and  67  in  the  bibliography  of  Zygmund's  Selected  Papers). 
The  main  content  and  the  continuation  of  the  paper  of  P.W.Z.  can  be  found 
in  the  last  chapters  of  the  book  of  Paiey  and  Wiener,  Fourier  transforms  in 
the  complex  domain  (1934). 

The  P.Z.  papers  consider  series  of  the  form 

^c„Cn(ai)f„(t)  (1.1) 

where  e,,  (tu)  arc  Rademacher  functions,  that  is,  essentially,  independent 
random  variables  taking  the  values  fl  and  -1  with  probability  1/2  and,  in 
particular,  random  trigonometric  series 

'Xj 

ReY_c,.CnUv)e''"  (1.2) 

0 
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and  random  Taylor  series 

OO 

^c„en(tu)z"  (1.3) 

0 

and  the  analogues  when  the  en(a>)  are  replaced  by  where  the  ujn 

are  independent  random  variables  whose  distribution  is  the  Lebesgue  mea¬ 
sure  on  [0,1].  We  call  (1.1),  (1.2),  (1.3)  Rademacher  series,  Rademacher 
trigonometric  series,  Rademacher  Taylor  series  and 


^Cne^"‘“-'fn(t) 

(1.4) 

oc 

Re^c„e^"'^''e‘"‘ 

(1.5) 

0 

oc 

(1.6) 

0 


Steinhaus,  Steinhaus  trigonometric  and  Steinhaus-Taylor  series. 

The  paper  of  P.W.Z.  introduces  Gaussian  series 

'X) 

^Cn£.nfn(t)  (1.7) 

C 

where  the  Ln  are  independent  Gaussian  normalized  random  variables  (ac¬ 
tually,  it  would  have  been  fair  to  call  them  Wiener  series),  in  particular 
Gaussian  trigonometric  series 

CC 

^  Cr.  (2.n  cosnt  +  £,'„sin  nt)  (1.8) 

c 

((i,n )  and  (2,',^ )  being  two  independent  normal  sequences)  and — in  a  slightly 
different  form — Gaussian  Taylor  series 

OO 

(1.9) 

0 

where  (Cn )  is  a  complex  normal  sequence,  for  example  Cn  =  +  iCi- 

The  case  rn  =  2  in  (1.8)  is  essentially  the  Fourier-Wiener  representation  of 
Brownian  motion,  and  the  paper  is  concluded  by  a  proof  of  the  almost  sure 
nowhere  differentiability  of  the  Wiener  function. 

We  shall  see  that  Rademacher  series  (I.l)  play  a  fundamental  role  in 
the  study  of  series  of  functions  whose  coefficients  are  independent  random 
variables,  in  particular  series  (1.4)  and  (1.7).  Steinhaus  series  (1.5)  and 
(1.6)  are  more  tractable  than  the  Rademacher  series  (1.2)  and  (1.3)  because 
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they  are  series  of  random  translates;  if  they  enjoy  a  property  almost  surely 
on  an  interval  of  values  of  t,  the  same  holds  on  the  whole  circle.  Gaussian 
series  ( 1 .8)  are  a  way  to  study  stationary  Gaussian  processes,  and  conversely 
Gaussian  stationary  processes  are  a  way  to  study  series  (1.8);  therefore,  a 
difficult  question  like  uniform  convergence  of  random  Fourier  series  is  more 
tractable  for  Gaussian  than  for  Steinhaus  or  Rademacher  series. 

On  the  other  hand,  Rademacher  series  (1.2)  and  (1.3)  were  used  from 
the  very  start  in  order  to  provide  examples  and  counterexamples  in  the 
theories  of  Fourier  series  and  analytic  functions  in  the  unit  disc.  This  is  still 
the  case  now,  and  we  shall  discuss  a  few  instances. 

The  introduction  to  P.W.Z.  has  a  historical  character.  It  describes  two 
streams  of  ideas,  from  which  P.Z.  on  the  one  hand  and  the  work  of  Wiener 
on  Brownian  motion  on  the  other  are  born,  and  it  indicates  that  "the  purpose 
of  this  paper  is  to  bridge  the  gap  between  the  P.Z.  and  the  W.  theories." 
Actually  the  purpose  is  not  achieved,  and  can  be  considered  as  a  permanent 
source  of  inspiration  since  then. 

2.  The  two  streams 

Before  going  back  to  P.Z.  and  P.W.Z.,  let  me  develop  the  historical  part, 
which  is  very  much  in  the  spirit  of  this  conference.  The  probability  theory 
of  the  twentieth  century  relies  on  totally  additive  measures  and  Lebesgue 
integration.  It  arose  from  very  specific  questions,  and  the  two  main  streams 
are  associated  with  two  names:  Borel  and  Einstein. 

2.1.  From  Borel  to  P.Z. 

"The  introduction  of  the  notion  of  random  into  analysis  is  in  the  first  instance 
the  work  of  Borel."  This  is  the  first  sentence  of  P.W.Z.,  and  they  quote  the 
theory  of  probabilMs  d4nombrabIes,  as  expounded  by  Borel  in  Rendiconti 
di  Palermo  in  1904.  Along  the  lines  of  questions  considered  by  Borel  they 
mention  Rademacher  (the  so-called  Rademacher  functions  were  introduced 
in  1922)  and  Steinhaus.  "To  Steinhaus  in  particular  is  due  the  reduction  of 
such  questions  to  questions  concerning  the  Lebesgue  integral." 

Let  us  be  more  specific.  The  problem  of  analytic  continuation  of  a 
function  defined  by  a  Taylor  series,  raised  by  Weierstrass  in  1880  (Zur  Func- 
tionenlehre,  Monatsberichte),  became  a  very  popular  question  in  France  in 
the  1890's.  Following  a  seminal  paper  of  Poincar^  in  1892  ("Sur  les  fonctions 
h  espaces  lacunaires",  Amer.  [.  Math.),  and  Hadamard's  thesis  on  the  relation 
between  the  coefficients  of  the  Taylor  series  and  the  singularities  of  the  func¬ 
tion  ("Essai  sur  I'^tude  des  fonctions  donn^es  par  leur  developpement  de 
Taylor",  Journal  Math,  pures  et  appliquies,  1892),  Borel  wrote  his  thesis  on 
a  problem  of  continuation  (not  necessarily  analytic  continuation!)  of  some 
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analytic  functions  ("Sur  quelques  points  de  la  thtorie  des  fonctions",  Ann. 

Sc.  Ecole  Normals  Superieure,  1895).  At  this  time  noncontinuable  Taylor  se¬ 
ries  appeared  as  a  pathological  situation.  Examples  were  given  by  Poincare 
and  by  Hadamard  where  (Cm  i  -  is  larger 

than  some  positive  number).  Then  Borel  published  a  note  in  the  Comptes- 
Reiidus  in  1896  and  an  article  in  Acta  Mathematics  in  1897,  with  the  same 
title  "Sur  les  series  de  Taylor",  and  a  remarkable  statement:  "une  serie  de 
Taylor  admet,  en  general,  son  cercle  de  convergence  comme  coupure"  (in 
general,  the  circle  of  convergence  of  a  Taylor  series  is  a  natural  boundary). 
Later  on,  in  1912,  reviewing  his  previous  work,  he  considered  this  state¬ 
ment  as  a  most  important  result  of  his  (cf.  "Oeuvres",  1,  p.  154).  Here  is  a 
translation  of  his  comments. 

The  main  difficulty  was  to  give  a  precise  meaning  before  going  to 
the  proof. .  .One  can  divide  the  series  into  an  infinity  of  successive 
groups  of  terms,  and  assign  to  each  group  a  point  of  the  circle  of 
convergence  which  depends  only  on  the  coefficients  of  this  group; 
these  points  form  a  set  E;  each  accumulation  point  of  E  is  a  singular 
point.  Clearly  now,  if  the  successive  coefficients  are  chosen  ran¬ 
domly,  that  is,  independently  from  the  preceding  coefficients,  the 
probability  that  F.  is  dense  on  the  circle  equals  one. 

I  shall  discuss  this  statement  later.  For  the  time  being  let  me  observe  that 
random  Taylor  series  appear  as  the  initial  motivation  of  Borel  for  probability 
theory.  From  the  start  it  was  the  source  of  important  probabilistic  ideas. 
For  example,  here  is  the  germ  (actually,  the  first  statement)  of  the  so-called 
Borel-Cantelli  lemma,  in  the  1897artic]e, 


on  a  done  sur  le  cercle  une  infinite  d'arcs  independants,  dont  la 
somme  depasse  tout  nombre  donne,  done,  en  general,  tout  point 
du  cercle  appartiendra  ^  une  infinite  d'arcs, . . 


(Translation:  one  has  infinitely  many  independent  arcs  on  the  circle  and 
their  sum  exceeds  any  given  number,  therefore,  in  general,  each  point  of  the 
circle  belongs  to  infinitely  many  arcs).  Clearly  "in  general "  means  "with 
probability  one,"  except  that  the  notion  of  a  totally  additive  probability  was 
not  available  in  1897.  Random  Taylor  series  forced  Borel,  not  only  into  the 
Borel-Cantelli  lemma,  but,  what  was  much  more  important,  into  the  ideas 
of  totally  additive  measures  and  probabilities. 

As  far  as  I  know,  the  topic  was  not  discussed  until  1929,  when  Steinhaus 
introduced  series  (1.6),  which  I  called  Steinhaus  Taylor  series.  For  Steinhaus 
the  basic  probability  space  was  the  interval  |0, 11  equipped  with  the  Lebesgue 
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measure,  and  the  standard  model  for  the  m,,  was  given  in  this  way;  if 

oc 

m  =  (2.1) 

1 

with  3,n  =  0  or  1  and,  say,  Y.T  3m  =  oo/  then 

oo 

con  =  ^  m:=2''-'.  (2.2) 

P  0 

In  this  way  all  problems  on  independent  random  variables  can  be  reduced  to 
estimates  of  Lebesgue  measures  or  integrals  on  [0, 1].  For  series  i  1 .6)  Borel's 
argument  is  correct  and  can  be  simplified  by  use  of  the  zero-one  law  (but, 
again,  the  zero-one  law  of  Kolmogorov  was  not  available  iii  1‘^2'^). 

Steinhaus's  article  had  a  direct  influence  on  Paley  and  Zygmund,  but 
they  first  considered  that  Rademacher  Taylor  series  ( 1  ..1)  were  a  much  more 
difficult  matter  than  Steinhaus  Taylor  series  f  1 .6);  they  announce  the  theorem 
un  non-continuation  at  the  end  of  their  first  paper,  and  postpone  the  proof 
until  the  last  paper.  A  much  simpler  proof  was  given  by  Zygmund  in 
1933.  Both  proofs,  the  complicated  and  the  simple,  played  a  role  in  the 
development  of  their  work  which  I  gave  in  the  60's. 

The  main  other  sources  of  P.Z.  are  orthogonal  series,  in  particular 
trigonometric  series  and  Rademacher  series.  Lacunary  trigonometric  se¬ 
ries  provided  methods  and  inspiration.  Here  is  a  comment  of  Zvgmund 
(Oeuvres  mathematiques  de  R.  Salem,  p.  24): 

on  pent  s’ exprimer  en  disant  que,  tandis  que  le  caract^re  aleatoire 
est  intrins^que  dans  les  series  lacunaires,  il  esl  ’’greffe"  dans  les 
series  (1.2). 

Lacunary  trigonometric  series  go  back  to  the  Weierstrass  example  of  a 
nowhere  differentiable  function.  It  became  a  real  mathematical  topic  with 
Kolmogorov  (1924),  Sidon  (1927),  Banach  (1930),  Zygmund  (1930  and  1932), 
and  it  so  happened,  that  most  theorems  on  sums  of  independent  random 
variables  were  stated  first  for  lacunary  trigonometric  series.  This  is  true  in 
particular  for  the  integrability  properties  of  the  partial  sums  (boundedness 
in  L’’)  and  the  summability  almost  everywhere,  established  by  Zygmund  for 
lacunary  trigonometric  series  before  being  stated  by  Paley  and  Zygmund  for 
Rademacher  series.  Part  of  the  influence  of  the  Lebesgue  integration  theory 
on  probability  theory  goes  through  trigonometric  series. 

As  a  conclusion,  the  stream  of  ideas  coming  from  Borel  started  with 
a  very  specific  problem,  analytic  continuation  of  random  Taylor  series  On 
one  hand  it  developed  probabilistic  notions.  On  the  other,  via  the  Lebesgue 
theory,  it  renewed  the  study  of  trigonometric  series  and  orthogonal  systems. 
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and  prepared  the  study  of  series  of  independent  random  variables.  P.Z. 
realizes  a  new  step  in  considering  random  series  of  functions,  in  particular 
random  trigonometric  series. 


2.2.  From  Einstein  to  P.W.Z. 


The  Brownian  motion  (first  discovered  by  the  botanist  Brown,  and  studied  by 
several  physicists  during  the  19th  century)  was  rediscovered  by  Einstein  as  a 
necessary  consequence  of  the  assumption  that  statistical  thermodynamics  is 
valid  for  liquids  as  well  as  gas.  Then  he  had  the  idea  of  using  the  quadratic 
variation  property  in  order  to  derive  the  atomic  dimensions  from  macro¬ 
scopic  observations  (that  is,  the  Brownian  motion  of  a  particle  of  which  the 
mass  is  known).  This  was  in  the  famous  year  1905,  and  published  in  Annalen 
der  Physik,  as  were  his  papers  on  relativity  and  on  the  photoelectric  effect. 

The  program  of  Einstein  was  realized  by  Jean  Perrin.  From  the  physi¬ 
cist's  point  of  view  it  was  a  triumph,  leading  to  the  attribution  of  the  Nobel 
Prize  to  Jean  Perrin.  From  a  mathematical  point  of  view  an  enormous  task 
had  still  to  be  done,  and  it  was  performed  by  N.  Wiener  in  different  steps, 
starting  with  "Differential  space"  in  1923.  In  many  papers  on  Brownian 
motion  (including  "Differential  space",  P.W.Z.,  and  specially  chapter  IX  of 
the  book  of  Paley  and  Wiener),  Wiener  quotes  Jean  Perrin: 


...  it  is  impossible  to  fix  a  tangent,  even  approximately,  and  we  are 
thus  reminded  of  the  continuous  non  differentiable  functions  of  the 
mathematicians.  It  would  be  incorrect  to  regard  such  functions  as 
mere  mathematical  curiosities,  since  indications  are  to  be  found  in 
nature  of  nondifferentiable  as  well  as  differentiable  processes  . . . 
(P.W.,  p.  157) 


The  program  of  Wiener  was  to  define  a  process  X(t,  cu)  with  the  prop¬ 
erties  pointed  out  by  Einstein  (independent  increments,  mean  quadratic 
variation  property)  together  with  the  almost  sure  properties  suggested  by 
Jean  Perrin  (continuity,  nowhere  differentiability).  The  definition  of  the  pro¬ 
cess  through  measures  and  integration  in  function  spaces  appears  already  in 
"Differential  space",  and  also  the  properties  of  the  Fourier  coefficients.  The 
Holder  property  of  order  |  appears  in  "Generalized  harmonic  analysis"  in 
1930.  Only  in  P.W.Z.  the  nowhere  differentiability  is  proved,  in  the  stronger 
form 


a.s.(cu)  Vt 


p — I  X(t-t-h, u))-X(t,iu) 

lim^ - TT-; - 


(2.3) 


for  every  <x  >  j,  and  it  is  observed  that  the  Holder  condition  holds  for  every 
a  <  j.  Later  on,  in  the  book  with  Paley,  Wiener  introduces  X(t,tu),  called 
the  fundamental  random  function,  through  the  device  of  Steinhaus  ((2.1), 
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(2.2))  and  the  Fourier  series  (the  notation  then  is  v|)(x,a)),  and  gives  a  full 
explanation  of  the  program  and  the  results. 

Though  P.W.Z.  also  contains  results  on  Gaussian  Fourier  series  analo¬ 
gous  to  those  of  P.Z.  on  Rademacher  and  Steinhaus  series,  the  main  result 
is  the  almost  sure  nowhere  differentiability  of  Brownian  motion,  that  is,  the 
final  achievement  of  the  program  of  Wiener  coming  from  Einstein  and  Jean 
Perrin. 


3.  The  situation  in  1933 

Let  us  go  back  to  the  Rademacher,  Steinhaus  and  Gaussian  series  (1.1)  to 
(1.9),  and  summarize  what  was  known  in  1933. 

1)  Suppose  ^  I  Cn  \^<  oo  and  |  fn(t)  K  I  (or  all  n  and  t  e  [0, 1).  Then 
the  series  (1.1),  (1.4),  (1 .7)  converge  almost  surely  almost  everywhere 
to  a  sum  F(t,  oj).  Moreover 

exp(XF^(t,  uj))dt  <  oo  a.s.(uj) 

for  each  X  >  0,  and  it  is  a  best  possible  result 
series  (1.2),  (1.5),  (l.B) satisfy 

sup  I  Sn(t,tu)  I---  Odog’^^n.)  a.s.(a)) 

t 

and  the  sums  of  (1.3),  (1.6),  (1.9)  are  analytic  functions  satisfying 
sup  i  F(re'®,cu)  1=  o  ^^logy^  j  a.s.(uj), 
this  being  best  possible  again. 

2)  Suppose  Y.  I  Cn  i'^=  00  and  lim  inf/^  fn(t)dt  >  0  for  all  n  and  every 

Borel  set  E  of  measure  >  0.  Then  the  series  (1.1),  (1.4),  (1.7)  diverge 
almost  surely  almost  everywhere.  Moreover,  given  any  process  of 
summation  T,  each  of  these  series  is  not  T-summable  almost  surely 
almost  everywhere.  As  a  consequence,  it  is  almost  sure  that  none  of 
the  series  ( 1 .2),  ( 1 .5),  ( 1 .8)  is  a  Fourier-Lebesgue  (nor  a  Fourier-Stieltjes) 
series.  Another  consequence  is  that,  assuming  lim  |  Cn  1,  the 

unit  circle  is  a.s.  a  natural  boundary  for  the  random  analytic  functions 
(1.3),  (1.6),  (1.9). 

3)  From  1  and  2  we  know  the  probability  of  the  event  that  ( 1 .2)  represents 
an  L'’-function  when  1  ^  p  <  oo.  This  probability  does  not  depend 
on  p,  and  it  is  1  or  0  according  to  the  convergence  or  divergence  of 
Y  \  Ct,  This  provides  a  new  and  beautiful  proof  of  a  theorem 
of  Littlewood;  no  condition  on  the  amplitudes  of  the  coefficients  of 


(3.1) 

.  The  partial  sums  of  the 
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a  trigonometric  series,  strictly  better  than  the  Riesz-Fisher  condition 
Y_  I  Cn  |^<  oo,  implies  that  the  series  is  a  Fourier-Lebesgue  series.  For 
changing  the  signs  randomly,  Y.  I  |^=  oo  implies  Y  ±Cne”'‘  ^  L'. 

4)  If£  I  cn  |n|  <  oo  (e  >  0),  (1.2)  converges  uniformly  with 

respect  to  t  a.s.,  and  the  same  for  ( 1 .5)  and  ( 1 .8).  This  is  no  longer  true 
for  £  =  0.  Moreover,  writing 

Sj=(  X! 

and  assuming  ^  Sj  =  oo,  (1.5)  is  Abel-divergent  a.s.a.e.,  therefore 
(1.5)^  (P.Z.  states  the  result  for  (1.2)  but  proves  it  for  (1.5),  with 

a  beautiful  kind  of  martingale  argument).  We  see  that  no  explicit 
condition  on  the  c^  is  given  for  the  series  (1.2),  ( 1.5)  or  (1.8)  to  represent 
continuous  functions  or  bounded  functions. 

5)  For  the  Brownian  motion  (or  series  (1.9)  with  Cn  =  vve  have  conti¬ 
nuity  and  nowhere  differentiability  in  the  strong  form  given  above  (see 
(2.3)  and  the  Holder  condition). 

The  year  1933  is  also  the  year  when  the  Grundbegriffeder  Wahrschein- 
lichkeitsrechnung  of  Kolmogorov  and  the  Asymptotische  Cesetze  der 
Wahrscheinlichkeitsrechnung  of  Khintchin  were  published.  The  law  of 
the  iterated  logarithm  had  been  discovered  a  few  years  before.  The  new 
and  majestic  stream  of  modern  probability  theory  was  just  born.  At  a  first 
look  P.Z.  and  P.W.Z.  close  a  period,  when  probability  should  be  reduced  to 
the  familiar  l.ebesgue  measure  on  the  line.  They  used  the  notations  and 
language  of  analysis,  they  ignored  the  newborn  foundations  of  probability 
theory,  and  they  reached  very  sharp  and  sometimes  final  results  on  the 
specific  problems  which  they  considered. 

However  P.Z.  and  P.W.Z.  were  also  the  source  of  much  subsequent 
work.  In  particular,  I  mention  the  study  of  Salem  and  Zygmund  (1954)  and 
the  applications  which  I  gave  of  their  methods,  the  thesis  of  Billard  (1963), 
the  two  editions  of  my  book  Some  random  series  of  functions  (SRSF  1969 
and  1985),  and  the  book  of  Marcus  and  Pisier  (1981).  Here  is  a  personal 
selection  of  themes. 


4.  The  local  behaviour  of  Brownian  motion 


Let  me  begin  with  the  end  of  P.W.Z.  The  local  version  of  the  law  of  the 
iterated  logarithm  reads  as  follows: 


Vt  a.s. 


lim  sup  —= 

h-io  \/2 


XJt  ‘  H)  -  X(i ) 

I  h  I  log  log  1/  I  h 


=  I. 
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Therefore  (Fubini)  the  t-set  defined  by 


a.s. 


lim  sup 

h->C 


|X(t  +  h)-X(t)  I 
v/2  !  h  I  log  log  1/  1  h  i 


=  1 


(4.1) 


is  of  full  Lebesgue  measure.  When  (4.1)  is  satisfied  we  say  that  t  is  an 
ordinary  point.  Do  there  exist  other  points?  This  is  not  obvious,  and  Paul 
Levy  expressed  the  feeling  that  all  t  are  ordinary  (Processus  stochasliques  et 
mouvement  brownien,  1948,  p.  247). 

The  Holder  condition  of  order  a  <  ]  was  improved  by  Paul  Levy.  One 
has  a.s. 


X(t  +  h)-X(t)  =0 


(h  -4  0) 


(4.2) 


uniformly  on  each  bounded  interval,  and  the  0  estimate  can  be  made  more 
precise. 

There  is  also  a  stronger  version  of  nowhere  differentiability  than  (2.3), 
namely 


a.s.Vt  lim  sup 


h-.0 


|X(t  +  h^X(t)| 

v/|H| 


>0. 


(4.3) 


This  is  due  to  Dvoretzky  (1963). 

Not  all  points  are  ordinary  points.  There  are  rapid  points,  for  which 


|X(t  +  h)-X(t)  I 
lim  sup  '  r-4  >  0 

^ihilogi^ 

and  slow  points,  for  which 

|X(t  +  h)-X(tl  I 
lim  sup - - - -  <  oo. 


(4.4) 


(4. ,5) 


Comparing  (4.4)  and  (4.2)  and  (4.3)  with  (4.3),  we  see  that  all  orders  of  mag¬ 
nitude  are  sharp.  Rapid  points  were  discovered  by  Orey  and  Taylor  (1974) 
and  slow  points  by  me.  There  were  further  studies  on  slow  points  by  Davis 
(1983),  Perkins  (1983),  Davis  and  Perkins  (1985),  with  more  information  on 
the  law  of  the  lim  in  (4.5),  and  estimates  for  the  Hausdorff  dimension  of  the 
t's  for  which  this  lim  does  not  exceed  a  given  number.  Proofs  of  the  existence 
of  rapid  and  slow  points  can  be  found  in  SRSF  1985. 

Let  me  explain  how  the  existence  of  slow  points  in  the  zero-set  of 
X(  )  can  be  viewed.  Let  E  =  X"'  (0)  be  the  zero-set,  and  (Ij)  the  family  of 
contiguous  intervals,  Ij  =  (aj,aj  4-  (j).  From  Paul  Levy's  construction  the 


{  Kahane 


490  } 

restrictions  of  X(  )  to  the  Ij  are  mutually  independent  and,  when  normalized 
in  the  form 

Yj(s)  =fr'''x(Qj 

they  are  independent  from  E  as  well.  Given  A  >  0,  the  t-set  such  that 
X^(t  +  h)  <  A  I  h  Iforall  hcan  be  viewed  as  theset  which  stays  illuminateded 
when  the  sun  runs  from  direction  (1 ,  A)  to  direction  (  — 1 ,  A)  and  the  .sunlight 
is  stopped  by  the  graph  of  X^(  ).  The  shadowed  interval  corresponding  to 
Y?(  )  is  obtained  from  Ij  by  a  random  enlargement; 

I)  =  (Qi  -fjPi.aj  +fi  +fiYi), 

where  the  couples  ( (3  j,yj)  are  independent  copies  of  a  random  couple  (|3,  y) 
whose  law  depends  on  A  only.  It  is  rather  intuitive  and  it  can  be  proved 
that  the  Ij  cover  93  a.s.  when  A  is  small  enough,  do  not  cover  93  a.s.  when 
A  is  large,  and  moreover  that  93\  U  f  j  has  a  Hausdorff  dimension  tending  to 
1  /2  (the  dimension  of  E  =  93\  u  Ij)  as  A  ->  oo.  My  1976  note  contains  the 
full  proof. 

This  approach  uses  the  Paul  Levy  construction  and  cannot  be  applied 
for  other  processes.  However  analogues  of  formulas  (4.1 )  to  (4.5),  including 
the  existence  of  rapid  and  slow  points,  hold  for  many  processes.  For  example, 
the  whole  is  valid  for  fractional  Brownian  motion  of  index  a  (-1/2  <  «  < 
1/2)  when  1  h  is  replaced  by  |  h  This  extends  to  Gaussian  Fourier 

series  when 

Cn  ' 

(meaning  that  the  ratio  is  bounded  above  and  below  by  positive  numbers). 
(4.2)  and  (4.3)  (with  |  h  "  in  place  of  |  h  l*''^)  extends  to  all  Rademacher, 
Steinhaus  or  Gaussian  Fourier  series  for  which 

o<in^^(2*''/^'“'sj)  <oo 

where 

Si==(  r  ICniM  (4.6) 

\2lSn<2l*'  / 

Proofs,  references  and  comments  can  be  found  in  SRSF 1968  and  my  paper  in 
honour  of  A.  Zygmund  (1983).  Slow  points  are  not  known  for  Rademacher 
or  Steinhaus  Fourier  series. 

It  is  natural  and  usual  to  compare  random  Fourier  series  and  lacunary 
trigonometric  series  (I  have  already  quoted  Zygmund  in  this  respect).  To  fix 
the  ideas  let  us  consider 

OO 

f(t)  --  ^2-*“cos2’t 
1 


(4.7) 
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and  series  (1 .8)  with  Cn  =  TT.  “,0  <  a  ^  1.  They  have  much  in  common. 
However,  when  0  <  a  <  1 ,  the  local  behaviour  of  f  ( • )  is  the  same  at  all  points: 


0  <  lim 


|f[t  +  h)-tit)| 

I  hr 


<  cxd; 


the  function  f  ( • )  is  more  regularly  irregular  than  the  corresponding  Gaussian 
process.  On  the  other  hand,  when  a  =  1,  f(  )  satisfies 


f(t  +  h)  +  f(t-h)-2f(t)  =0(i  HI) 
f(t  +  h)-f(t)=0(|h|logp^) 

uniformly  with  respect  to  t, 
i  h  I  log 

for  quasi  all  t  (meaning  except  on  a  set  of  the  first  category  of  Baire), 


o<&  + 

h-.0  I 


•v/iog^i'og'ogiog^ 

for  almost  every  t, 

lf{t  +  h)-f(t)|  ^ 

lim - r- — ^ - —  <  00 

h-tO  I  h  1 

on  a  dense  set  of  t,  and 


<  oo 


(4.8.1) 


(4.8.2) 


(4.8.3) 


(4.8.4) 


f  is  nowhere  differentiable.  (4.8.5) 

Here  (4.8.3)  is  the  behaviour  at  ordinary  points,  (4.8.2)  at  rapid  points,  (4.8.4) 
at  slow  points.  In  this  case,  the  lacunary  series  (4.7)  has  the  same  kind  of 
irregular  irregularity  as  the  Brownian  motion.  These  results  on  lacunary 
Fourier  series  are  due  to  Geza  Freud  (1962,1966)  (see  also  Izumi,  Izumi, 
Kahane  1965  and  Kahane  1986  where  several  other  references  can  be  found). 


5.  A  few  applications  of  Brownian  motion 
and  Gaussian  processes  to  Fourier  analysis 

From  P.Z.  comes  an  important  and  general  idea,  in  many  circumstances  it 
is  hard  to  find  a  mathematical  object  with  some  prescribed  properties,  and 
pretty  easy  to  exhibit  a  random  object  which  enjoys  these  properties  almost 
surely.  I  already  insisted  on  the  use  of  Rademacher  trigonometric  series  in 
order  to  prove  that  no  better  condition  than  Riesz  Fischer's  on  the  moduli  of 


L 
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coefficients  implies  that  a  trigonometric  series  is  a  Fourier  series.  Let  me  give 
a  few  other  examples  using  Brownian  motion  or  Gaussian  Fourier  series. 

1)  A  problem  of  Beurling  on  closed  sets.  Given  a  closed  set  E  c  iT’/,  with 
Hausdorff  dimension  a,  no  nonzero  measure  4  carried  on  E  satisfies 
fi(u)  =  0(1  u  1“^)  with  |3  >  f;  this  comes  from  potential  theory.  Is 
it  possible  to  find  E  with  a  prescribed  dimension  a,  carrying  nonzero 
measures  ttsuch  that  |.'i(ii)  =  0(1  u  !“•')  for  all  13  <f?  Let  me  remark 
that  if  d  $  2  and  a  =  d  -  1,  the  d  -  1  dimensional  sphere  answers  the 
question,  and  the  boundary  of  the  d-dimensional  cube  is  not  a  solution, 
because  fi(u)  cannot  even  tend  to  zero.  The  difficult  situation  is  d  ' 
and  0  <  a  <  1,  Salem  solved  Beurling's  problem  by  means  of  an  ad  hoc 
random  construction  (1950).  The  Brownian  motion  provides  a  simple 
solution:  choose  any  closed  set  F  c  fB,  with  dimension  |;  then  XI F  ) 
answers  Beurling's  problem  a.s.  In  short.  Brownian  images  of  closed 
sets  are  Salem  sets  [26J. 

2)  A  problem  on  U-sets.  A  compact  set  E  c  93'*  is  called  a  U-set  if  it 
carries  no  nonzero  distribution  whose  Fourier  transform  tends  to  zero 
at  infinity  For  example,  the  boundary  of  the  d-dimensional  cube  is  a 
U-set.  Salem  introduced  in  1944  an  entropy  condition  which  implies 
that  E  is  a  U-set,  namely 


lim  inf  p--iL* 
1 -n'  logi  c 


0 


where  (E)  denotes  the  smallest  number  of  balls  of  radius  c  whose 
union  contains  E.  Actually  Salem's  condition  implies  more,  namely 


iim  i  1  (ul  I  -  sup  I  T(ul  I 

I'*!-*';  u 

for  all  pseudomeasures  (distributions  with  bounded  Fourier  trans¬ 
forms)  cairied  by  E.  Using  Brownian  images  (E  =  X(F|)  Salem's 
condition  appears  best  possible  in  the  following  sense: 

a)  given  6  >  0  there  exists  E  93  with  (L)  -  0(log  ),  carrying  a 
probability  measure  n  such  that  lim|,i|„^j,^  |  uiu)  |<;  6. 

b)  given  any  function  A(c)  tending  to  00  as  e  -  >  0  there  exists  a 
non-U-set  E  with  (E )  =  0(A(e)log3 )  (e  -4  0)  [27]. 

3)  A  problem  on  modifications  of  continuous  functions.  A  famous  theo¬ 
rem  of  MenSov  says  that,  given  a  continuous  function  f  on  X,  and  e  >  0, 
it  is  possible  to  change  f  on  a  set  of  Lebesgue  measure  $  c  and  get  a 
"good"  function;  meaning  that  the  Fourier  series  converges  uniformly 
[2,  p.  438-457).  Katznelson  proved  that  "good"  cannot  mean  that  the 
Fourier  series  converges  absolutely:  there  exists  a  continuous  function 
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f  such  that  no  restriction  of  f  to  a  set  of  positive  Lebesgue  measure 
can  be  represented  by  an  absolutely  convergent  trigonometric  series 
(1975).  Olevskii  was  able  to  get  such  bad  functions  f  in  every  Holder 
class  of  index  a  <  j  (1978)  and  HruSCev  had  the  idea  of  using  Gaussian 
Fourier  series  in  order  to  get  these  bad  functions  (1981).  Actually  the 
Brownian  motion  is  an  example  [21]. 

4)  Another  problem  on  modifications  of  continuous  functions.  Given 
again  a  continuous  function  f  on  T,  there  exists  a  homeomorphism 
(p  :  I  — »  T  such  that  f  o  cp  has  a  uniformly  convergent  Fourier  series. 
This  is  a  theorem  of  Bohr  and  Pal  for  real  valued  functions,  Saakian, 
Kahane  and  Katznelson  for  complex  valued  functions  (for  a  history 
and  comments,  see  Kahane  1982).  Is  it  possible  to  replace  uniformly 
convergent  by  absolutely  convergent?  The  answer  is  negative  (Olevskii 
1981).  Open  question:  does  the  Brownian  motion  provide  an  example? 
(Here  as  above,  we  subtract  a  linear  term  to  the  usual  Brownian  motion 
in  order  to  have  a  continuous  function  on  the  circle). 

5)  The  problem  of  spectral  synthesis  in  (2.).  It  reads  as  follows:  given 
(a„  in  ('  (2)  and  |h„  in  {2),  such  that  the  function 

fft)  Y_  QnC'"' 

11  ^  C, 

vanishes  on  the  support  of  the  pseudomeasure 
T(t)^ 

nf-i. 

is  it  true  that  necessarily 

(T,  (>  =  ^  a„h_„  ^  0? 

11 6  C. 

In  1959,  Malliavin  solved  the  question  in  a  negative  way,  using  a  la- 
cunary  trigonometric  series  for  the  definition  of  i.  Actually  Gau.ssian 
trigonometric  series  give  easier  computations  on  applying  Malliavin's 
idea,  namely,  to  define  T  as  t)'(f),  S'  being  the  derivative  of  the  Dirac 
measure.  This  is  very  much  in  the  spirit  of  local  time  (which  can  be 
defined  as  6(f) ). 


6.  Back  to  random  Taylor  series 

A  discussion  of  Borel's  statement  on  random  Taylor  series 

■•V., 

0 


2 


(6.1) 
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whose  coefficients  are  independent  complex  random  variables  will  show  the 
crucial  role  of  Rademacher  series. 

Let  us  start  with  Rademacher  Taylor  series  (1.3),  and  suppose  that 
lim  I  Cn  1'^''=  1,  i.e.,  the  circle  of  convergence  is  the  unit  circle  |  z  !==  1.  We 
want  to  prove  that  the  unit  circle  is  a  natural  boundary  a.s.  (theorem  of  P. 
Z.,  proved  in  their  last  paper).  Suppose  the  contrary:  there  exists  an  arc  1  on 
the  circle  which  consists  of  regular  points  (let  us  say:  a  regular  arc)  with  a 
positive  probability,  and  this  probability  is  1  by  the  zero-one  law.  The  length 
of  I  exceeds  ^  for  N  large  enough. 

Now  we  change  signs  in  (1'.3):  we  define  ej;  =  Cn  when  n  -  k  (mod 
N)  and  ej;  =  -Cn  when  n  7^  k  (mod  N).  The  series 

(1.3-' 

0 

has  the  same  almost  sure  properties  as  (1.3),  and  adding  (1.3)  and  (1.3*) 
we  obtain  a  function  of  the  form  z‘"Hv.(z^  ).  Being  regular  on  I,  H|^(z^  1  is 
regular  on  the  unit  circle  (a.s.).  Now 


k  0  0 

therefore  the  unit  circle  is  regular  for  (1.3),  a  contradiction.  (In  the 
edition  of  SRSF  I  thought  that  this  proof  was  new  Actuall)'  it  was  giv  en  by 
Zygmund  in  1933). 

The  unit  circle  plays  no  special  role.  What  we  proved  is  that  the  circle 
of  convergence  of  (1.3),  whenever  it  exists,  is  a  natural  boundary  a .s..  This 
result  extends  to  series  (6.1 )  whenever  the  X„  are  symmetric  (same  law  tor 
X„  and  -X„).  Here  is  a  way  to  .see  it.  The  X„  are  defined  on  C).  l  et  us 
introduce  another  probability  space  O'  and  a  Rademacher  sequence  t  „  on 
O'.  Now  consider  O  <  O'  as  our  probability  space,  and  the  random  series 

^e„(a-')X„(ce)z'’.  (6.2) 

0 

For  each  given  ce',  (6.2)  is  nothing  but  another  version  of  (6.1),  therefore 
(6.2)  has  the  same  almost  sure  properties  (on  O  x  O')  as  (6.1)  (on  O).  For 
each  given  a>,  (6.2)  is  a  Rademacher  Taylor  series,  therefore  has  its  circle  of 
convergence  as  a  natural  boundary  a.s.  (on  O').  Therefore  the  same  holds  a.s. 
on  n  ^  O',  proving  the  result.  Let  me  observe  that  the  radius  of  convergence 
of  (6.1 )  is  a  constant  a,s,;  we  can  speak  of  Ihc  circle  of  convergence  of  (6.1 1 
whenever  this  constant  is  neither  0  nor  <xi. 
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When  the  X„  are  not  symmetric,  we  use  again  Q  x  O'  and  consider 

cc 

^(Xnlte) -XJiu'Dz".  (6.3) 

0 

Now  the  coefficients  are  symmetric  and  independent,  and  we  can  apply  the 
previous  result.  Either  '  z  |=  p  is  a.s.  a  natural  boundary  for  (6.1 ),  or  (6.3) 
has  an  a.s.  radius  of  convergence  p'  >  p.  In  thi.‘;  case  a.s.(a'')a.s.lcu)  the 
circle  |  z  |=  p'  is  a  natural  boundary  for  (6.3)  (with  the  obvious  modification 
when  p'  =  oo).  We  can  choose  w'  in  such  a  way  that 

1)  the  radius  of  convergence  of  X„(ie')i''  is  p; 

2)  a.s.(a>)  the  circle  of  convergence  of  (6.3)  is  a  natural  boundary. 

Writing  X„  (cu' )  -  x„,  we  decompose  (6.1 )  in  the  form 

TV.  X 

^x„z''-V(X„  x„)z’'  (n.'l) 

0  c 

and  we  obtain  as  a  final  statement  something  slightly  different  from  Borel's, 
namely:  either  z  p  is  a.s.  a  natural  boundary  lor  (6.1),  or  (6.1)  can  be 
decomposed  in  the  form  (6.-1 ),  where  the  first  series  converges  for  i  z  !■..  p 
and  the  second  a.s.  in  a  larger  disc  z  :<  ()',  which  is  its  domain  of  existence. 

This  IS  a  theorem  of  Ryll-Nardzewski  (l'?.s.3),  answering  a  question  of 
Blackwell.  The  proof  given  here  is  taken  from  SRSI'. 

[•rom  the  proof  we  can  extract  two  principles  which  apply  in  more 
general  situations,  called  the  principle  ot  reduction  and  the  device  of  sym- 
metriZiJtion  in  SRSE  pp.8,9.  Now  we  consider  independent  random  vectors 
in  a  linear  space  and  we  denote  them  by  X„  If  the  X„  are  symmetric  and 
c],  c’, t', ,  is  a  fixed  sequence  with  values  'lor  1,  the  random  se¬ 
quences  (X„ )  and  (c’,Xn  )  have  the  same  distribution  (tliat  is  what  vve  used 
first).  If  the  X„  are  symmetric  and  (c„)  is  a  Rademacher  .sequence  inde¬ 
pendent  from  (X„  I,  the  random  sequences  (X,,  I  and  (c„X„  I  have  the  same 
distribution.  As  an  application,  let  us  consider  a  property  P  which  can  be 
satisfied  or  not  by  any  sequence  of  vectors,  and  suppose  that,  whenever  ( c  „  ’ 
is  a  Rademacher  sequence  and  (Xn  )  is  a  constant  sequence  of  vectors,  I  c„  x„  1 
satisfies  Pa.s.;  then,  assuming  that  "(X„ )  satisfies  P"  is  an  event,  this  event  is 
almost  sure.  Erom  this  principle  of  reduction  it  appears  that  the  Rademacher 
series  of  functions,  or  Rademacher  series  of  vectors,  plays  a  central  role. 

,Now,  given  a  random  vector  X  defined  on  O,  the  random  vector 
'Vice, to')  X(to)  -  X(to')  defined  on  the  product  space  Q  •  ()  is  sym¬ 
metric.  When  we  know  that  Y  enjoys  a  given  property  a.s.  (on  O  ■  O],  it 
follows  that  there  exists  to'  <=  Q  and  a  fixed  vector  x  =  X(to')  such  that  X  x 
enjoys  the  same  property  a.s.  (in  Q).  This  is  the  device  of  symmetri^ntion. 
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A  third  principle  comes  directly  from  the  original  proof  of  P.Z.  of  the 
non-continuation  of  series  (1.3).  Here  is  the  idea:  ifa  Tavlorseries  a„z’‘  is 
continuable  across  the  circle  of  convergence  to  a  point  C  the  series  X!  '-‘n  c", 
though  divergent,  is  summable  by  a  convenient  summatior  matrix.  Now, 
given  any  summation  matrix  S  (the  exact  definition  is  given  .n  SRSF-  p.  12) 
and  a  series  X,,  of  independent  symmetric  vectors  in  a  Banach  space, 
if  the  series  is  S-summable  a.s.,  it  converges  a.s.  This  third  principle  (a.s. 
summability  implies  a.s.  convergence)  can  be  applied  in  manv  situations. 
It  gives  non  continuation  theorems  for  random  Taylor  series  with  several 
variables,  for  random  Dirichlet  series  and  for  other  series  of  analytic  func¬ 
tions.  Through  Fejer's  or  l.ebesgue-Fejer's  theorem  on  Fourier  series  it  has 
a  kev  role  in  order  to  explain  that  '1.2)  fails  a.s.  to  represent  ari  1  '  function 
when  (c„ )  -/  t‘. 

7.  Back  to  random  trigonometric  series 

l.et  us  consider  Rademacher  trigonometric  scries 

Y  c„r„cos(n.t  4  cp,,  |  17.1 1 

where  the  amplitudes  r„  and  the  phases  q),,  are  given,  and  Steinhaiis  trigono¬ 
metric  series 

X 

^r„cos(n.t  •  2-tu’„  1  I  7.2) 

e 

where  the  r„  are  given,  and  more  generally 

X 

Y  X„cos(nt  f  cl)„  )  17,3) 

0 

where  the  XhC''”  "  are  independent  symmetric  complex  \  ariables. 

\'ia  the  principle  of  reduction,  the  P.7,  theorems  sav  that  the  following 
properties  have  the  same  pnibability,  0  or  1: 

■  (7.3)  is  a  Fourier-Lebesgue  series 

■  (7.3)  represents  a  function  which  belongs  to  all  f'  ( I  p  '3o  ) 

■  (7.3)  converges  almost  t  eery  where 

■  (7.3)  is  Abel-summable  almost  everywhere. 

The  case  of  uniform  convergence  is  difficult,  though  a  rather  precise 
result  can  be  given  in  terms  of  the 
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(already  introduced  in  slightly  different  forms  in  (3.2)  and  (4.6))  for 
Rademacher  and  Steinhaus  trigonometric  series  (7.1)  and  (7.2).  Here  is 
the  result; 

1)  If  Sj  =  oo,  (7.1 )  and  (7.2)  fail  a.s.  to  represent  a  bounded  function. 

2)  If  Sj  <  oo  and  Sj  is  a  decreasing  sequence,  (7.1  j  and  (7.2)  converge 

uniformly  a.s.,  therefore  represent  continuous  functions  a.s..  The  sec¬ 
ond  statement  is  a  corollary  of  a  result  of  Salem  and  Zygmund  (1954). 
The  first  was  stated  in  RZ.  but  proved  only  for  Steinhaus  series  (7.2) 
as  I  already  said.  The  first  proof  was  given  in  Billard's  thesis  (1963). 

Billard's  idea  was  to  deri\'e  properties  of  Rademacher  series  from  prop¬ 
erties  of  Steinhaus  series,  by  means  of  a  principle  of  contraction.  We  shall 
see  a  general  form  of  the  principle  of  contraction  in  the  context  of  random 
series  in  Banach  space.  The  initial  inspiration  was  to  prove  that  (7.1 )  and 
(7.2)  fail  to  represent  a  bounded  function  with  the  .same  probability.  Here 
is  a  consequence  of  Billard's  theorv:  the  following  properties  have  the  same 
probability,  0  or  1: 

•  (7.3)  represents  a  bounded  function 

•  (7.3)  represents  a  continuous  function 

•  (7.3)  coin'erges  everywhere 

•  (7.3)  converges  uniformly. 

Of  course,  the  interest  of  Steinhaus  trigonometric  series  is  to  in\  ol  ve  random 
translations,  and  Billard's  theiiry  contains  interesting  st.stements  on  series  of 
random  translates. 

Let  me  gi\e  an  application  of  Rademacher  trigonometric  series  to  a 
property  of  f'ourier  coefficients  of  continuous  functions.  Is  it  true  that,  given 
a  positive  sequence  (r,,  )  in  (',  one  can  choo.se  the  pha.ses  9),,  in  such  a 
way  that 

•» 

Y  T-nCOSint  '  i 
0 

represents  a  continuous  function?  The  answer  is  no,  aiiel  given  by  a  lacunary 
series.  Is  it  true  now  that,  given  (r„ )  in  (^,  one  can  enlarge  the  r„  (i.e.,  choose 
f'r,  r„  1  and  choo.se  the  in  such  a  wav  that 

^r;,cosint  +  45„| 

0 

represents  a  continuous  function?  The  answer  is  yes,  using  an  iterative 
randomization  (De  I.eeuw,  Kahane,  Katznelson  1977). 
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8.  Rademacher  series  in  Banach  space 

My  first  motivation  for  considering  Rademacher  series  in  Banach  spaces 
was  to  obtain  the  principle  "summability  implies  convergence"  ,  and  the 
contraction  principle,  in  this  context.  The  contraction  principle  depends  on 
an  integrability  property  of  the  sum  of  Rademacher  series,  which  I  obtained 
as  a  consequence  of  the  fact  that,  if  the  probability,  that  such  a  sum  is  large, 
is  small,  the  probability  that  it  is  twice  as  large  is  very  small; 

3’(||  V||>r)<|^T(yV||>2r)<a^ 

(Kahane  1964). 

As  a  consequence,  not  only  ||  V  ||t  L'lfT),  but  expX  |]  V  j|t  L'(tT) 
when  A  is  small  enough.  Actually  the  final  result  in  this  direction  is  due  to 
Kwapien  (1976);  exp  a  ||  V  i|^e  L'(fl)  for  all  a  >  0. 

The  contraction  principle  expresses  that  the  a.s.  convergence  (or  bound¬ 
edness)  of  a  Rademacher  series  of  vectors 

CC 

^CrUn  (8.1) 

1 

implies  the  same  for  any  series  en^nU,,,  where  A„  is  a  given  bounded 
sequence.  As  a  consequence,  the  Rademacher  series  and  the  corresponding 
Steinhaus  series 

(8.2) 

I 

have  the  same  probability  to  converge  (and  also  the  same  probability  that  the 
partial  sums  are  bounded).  In  the  case  of  trigonometric  series  in  the  space 
of  continuous  functions  on  T,  Billard's  theorem  is  recovered.  It  is  the  way 
things  are  explained  in  SRSF. 

There  is  now  a  huge  literature  on  random  series  in  Banach  spaces, 
with  a  special  interest  on  Rademacher  series,  which  play  a  basic  role  in  the 
relation  between  geometry  and  probability  in  Banach  spaces  (Hoffmann- 
Jergensen  1974  and  1977,  Maurey  and  Pisier  1976,  Garling  1977,  the  book 
of  Pisier  1989,  the  book  of  Ledoux  and  Talagrand  1991,  where  a  very  huge 
bibliography  is  given),  Khintchin's  inequalities,  which  express  the  behaviour 
of  the  norms  ||  Y.  £nUn  II  in  different  L'’(n),  appear  as  a  consequence  of  the 
strong  integrability  (Pisier  1978).  The  notions  of  type  and  cotype  express 
the  behaviour  of  the  expectation  E  ||  CnUn  II  with  respect  to  the  L’’  norm 
of  (11  Un  III  (Maurey  and  Pisier).  Isoperimetric  methods  were  developed 
(Talagrand). 
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Let  me  mention  only  one  comparison  theorem  of  Talagrand  (Theo¬ 
rem  4.12  in  Ledoux-Talagrand  1991).  First,  using  duality,  the  norm  of  a 
Rademacher  sum  (say,  of  N  terms),  is  expressed  as 


N 

X^CiXi 

=  sup 

N 

1 

B 

I 

where  t  =  (ti ,  ■  •  In  )  €  91'^  and  T  is  the  image  of  the  unit  ball  in  B'  through 
the  mapping  f  — »  [f(Xi)}.  The  comparison  theorem  involves  a  convex  and 
increasing  function  FI*)!  *  — »  91^ )  and  a  sequence  of  contractions  cpi  (91  -)  91) 
such  that  (pi(0)  =  0;  it  reads  as 

E  ^F  sup  I  Y_  I  j  j  $  E  ^F  |^sup  |  ^  CiU  j  . 

Previously  such  an  inequality  was  obtained  when  the  Rademacher  sequence 
Ci  is  replaced  by  a  normal  sequence  ii,  as  a  consequence  of  comparison 
theorems  for  Gaussian  processes.  Rademacher  sums  need  a  quite  different 
treatment. 

Open  problems  on  Rademacher  series  in  Banach  spaces  can  be  found 
in  the  book  of  Ledoux  and  Talagrand  (section  12.3). 

9.  Back  to  P.W.Z.'s  program  (“to  bridge  the  gap”  ) 

From  the  contraction  principle  Rademacher  and  Steinhaus  series  in  Banach 
spaces  behave  in  the  same  way.  If  now  we  consider,  in  addition  to  (8.1 1  and 
(8.2),  a  Gaussian  series 

OO 

^^nUn  (9.1) 

1 

where  (Ln )  is  a  normal  sequence  as  in  (1 .7),  the  almost  sure  convergence  of 
(8.1 )  by  no  means  implies  the  same  for  (9.1 ). 

However,  if  we  consider  Rademacher,  Steinhaus  and  Gaussian  Fourier 
series  (1.2),  (1.5),  (1.8),  they  have  the  same  probability  to  converge  uniformly 
(which  is  the  same  as  to  represent  a  continuous  function,  or  to  represent  a 
bounded  function).  This  remarkable  fact  was  discovered  by  Marcus  and 
Pisier  in  1978.  The  book  of  Marcus  and  Pisier  "Random  Fourier  series 
with  applications  to  harmonic  analysis"  contains  the  proof,  together  with  an 
explicit  condition  on  the  Cn,  and  previous  results  of  Pisier. 

Here  is  the  explicit  condition.  Consider 

4i>(t)  =  ^  I  Cn  sin^nt 
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?i(K) 


and  define 


I(U>)  = 


dx 


Jo  s/log  1/x 

where  y\>'  is  the  increasing  rearrangement  of  il).  Then 
I(vij)  <  oo 


is  the  necessary  and  sufficient  condition  for  (1.2),  (1.5),  and  (1.8),  to  con¬ 
verge  unifcirmly,  or  represent  a  continuous  function,  or  represent  a  bounded 
function,  a. s. 

Actually  this  is  nothing  but  a  reformulation  of  the  Dudley-Fernic]ue 
theorem  for  a  stationary  Gaussian  processes.  The  usual  form  of  the  Dudlev- 
Fernique  theorem,  for  stationary  Gaussian  processes  X,  defined  on  the  circle 
X,  involves  the  integral 


I  = 


^logN(e)de 


where  N(e)  is  the  minimum  number  of  balls  of  radius  e  covering  X,  in  the 
metric  defined  by 

d(t,t')  =  (h  I  X,  -  X,.  h’  G 

The  Dudley-Fernique  theorem  was  the  solution  for  stationary  process  of  a 
long-standing  problem  of  Kolmogorov:  how  to  recognize,  from  the  geometry 
of  the  Gaussian  (centered)  process  (X,  )tgK  if  there  are  a.s.  bounded,  or 
a.s.  continuous,  versions?  Surprisingly,  the  general  problem  has  a  solution 
obtained  by  Talagrand  (1987).  Here  is  a  simple  and  weak  form  of  the  result  of 
Talagrand  (Theorem  12. 10  in  Ledoux-Talagrand).  Let  me  start  with  examples 
of  Gaussian  processes  with  bounded  versions: 

1)  (Y„)ntc  with  EYf,  =  O  ( (because  XI  ^(1  Y„  i>  ?>)  <  oo  when  A  is 
large); 

2)  (X|  ),f.K  in  the  convex  hull  of  such  a  sequence  (Y„). 

Theorem:  example  2  is  the  general  case. 

There  is  a  weakness  however  in  this  remarkable  result,  as  well  as  in 
other  versions  of  Talagrand's  theorem:  how  to  recognize  the  Y„  when  (X, ) 
is  given?  Therefore,  the  interest  of  the  theorem  of  Dudley  and  Fernique 
is  not  abolished  with  the  theorem  of  Talagrand.  It  can  be  added  that  the 
necessary  and  sufficient  condition  of  Marcus  and  Pisier  does  not  suppress 

the  interest  of  looking  at  the  Sj  (=  (^2i-. |ii|.  21  • '  I I")'  ')  in  order  to  get 

easy  information  on  local  regularity  or  irregularity. 
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10.  An  open  question;  Bridging  the  gap  for  random  Tayior  series 

I  insisted  on  the  non-continuation  problem,  but  there  are  a  lot  of  problems, 
works  and  results  on  random  Taylor  series;  a  partial  bibliography  can  be 
found  in  SRSF 1985.  I  shall  restrict  myself  to  one  question:  what  can  be  said 
on  the  range  and  the  distribution  of  values  of  functions  F(2)  which  are  sums 
of  series  (1.3)  (Rademacher),  (1.6)  (Steinhaus),  (1.9)  (Gaussian)  in  the  case 
whenlim  |  Cn  |’'"=  1  (the  unit  disc  is  the  domain  of  existence  of  the  random 
analytic  function  F(2))  and  |  c„  p=  oo  (i.e.,  F(z)  does  not  belong  to  the 
Nevanlinna  class)? 

There  is  a  very  good  information  for  Gaussian  Taylor  series  (SRSF  1985, 
chap.  13,  and  Kahane  1987):  the  range  of  F(z)  (|  z  |<  1 )  isa.s.  the  whole  plane 
ff.  Moreover,  for  some  fixed  sequence  rv  — >  1,  we  have  a.s.  estimates  for  the 
Nevanlinna  function 


Jo  s 

where  n(r,  b,  F)  is  the  number  of  zeros  of  F(z)  -  b  in  the  disc  1  z  r,  namely: 
it  is  almost  sure  that  for  all  bed, 

1)  N(rv,b,F|  =  N(rv,0,F|  +  0(1) 

2)  N(rv,  b,  F)  .>  jlog  p(rv  )  where 


P(r)  -  ^  i  c„  I-’  r’"  =  F(|  F(rc‘“)  1^1. 
0 


For  Steinhaus  Taylor  series  the  first  part  is  known,  namely,  the  range  of  Ffz) 
(I  z  |<  1)  is  a.s.  the  whole  plane  f  (Offord  1972;  also,  for  a  far-reaching 
generalization,  Murai  1978). 

For  Rademacher  Taylor  series  the  question  is  open  as  far  as  1  know. 
The  best  result  which  1  know  is  due  to  Jacob  and  Offord  (1983):  if  log  N  = 
0(Xli^  I  Cn  I)  (N  oo),  then  the  range  is  (T  a.s.. 

It  should  be  added  that  the  topic  has  a  long  history,  going  back  to 
Littlewood  and  Offord  1948-1949,  who  considered  the  distributions  of  the 
zeros  of  F(z)  -  a  when  F(z)  is  a  random  entire  function. 

There  is  also  a  remaining  question,  even  in  the  case  of  Gaussian  Taylor 
series.  Is  it  true  that  either  such  a  series  represents  a. s.  a  continuous  function 
on  the  closed  disc  |  z  1,  or  that  it  maps  a.s.  the  open  disc  |  z  |<  1  onto  the 
whole  plane  (T? 
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11.  Some  applications  to  harmonic  analysis: 

Sidon  sets,  the  Pisier  algebra,  Riesz  products 

I  have  already  given  two  applications  of  the  RZ.  theory  to  Fourier  series; 

1)  if  ^  I  Qn  |^=  oo,  ^  ±ane''“  €  L'  for  some  choice  of  signs  ±; 

2)  if  12  I  “In  l^<  oo  ,  L  bnC'"*  6  C  for  some  sequence  b^,  |  b,^  |^|  On  I- 

Moreover,  1  gave  a  series  of  applications  of  Brownian  motion  and  special 
Gaussian  processes. 

Let  me  concentrate  on  another  aspect  for  a  while:  the  use  of  random 
polynomials  or  random  series  in  order  to  study  lacunary  sets.  A  theorem 
of  Sidon  says  that,  if  A  is  a  set  of  integers  which  is  Hadamard-lacunary  (the 
distance  between  two  consecutive  points  is  larger  than  some  given  fraction 
of  their  distances  to  the  origin),  and  if  a  continuous  function  on  ‘X  has  its 
frequencies  in  A,  its  Fourier  series  converges  absolutely.  With  obvious  and 
classical  notations  we  have 

Ca=Aa.  (11.1) 

Now  to  take  ( 1 1 .1 )  as  a  definition  of  a  Sidon  set  in  Z. 

A  Sidon  set  has  to  be  lacunary  in  some  sense.  The  first  method  in  order 
to  see  this  is  to  use  random  trigonometric  polynomials  of  the  form 


P(t)  =  ^  ±CAe‘^‘,  CA=0orl. 

A€A 

From  the  definition  of  a  Sidon  set  there  exists  a  constant  K  =  K(  A)  such  that 

II  P||a^  K||P||c  . 

We  may  have  an  estimate  of  the  form 

3’^I|P|Ic^b|^^|ca|^^  ^>0. 

For  example,  from  estimates  of  Salem  and  Zygmund  we  can  take  B  = 
y/log  N,  N  being  the  degree  of  P.  Since 

X.  i  1=  X.  I 

(number  of  terms  in  P),  a  convenient  choice  of  ±  gives 

V  $  KBv’/^ 

V  $  K^B^. 
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Using  the  estimate  of  Salem  and  Zygmund  we  see  that  the  number  of  points 
of  A  in  [— N,  N]  cannot  exceed  K^log  N.  Using  an  estimate  of  the  same  kind 
for  random  trigonometric  polynomials  in  several  variables  we  have  a  more 
precise  necessary  condition:  if  A  is  a  Sidon  set,  there  exists  a  constant  K'  such 
that  A  contains  at  most  [K'slog-vl  elements  of  the  form 

Q  +  mpl  +  n2p2  H - 1-  TVsPs 

when  a,  pi,  -.Ps  are  given  real  numbers  and  ni,n2,-  •  -.Us  are  integers 
such  that  I  m  I  +  I  U2  I  +  ■  ••  I  ns  V  (Kahane  1957). 

In  the  opposite  direction  any  quasi-independent  set  A  is  Sidon.  Quasi- 
independent  means  that  ^  ~  -1,0, 1  implies  that  all 

OL\  are  0.  A  finite  union  of  quasi-independent  sets  is  also  Sidon.  Until  now  it 
is  not  clear  if  this  is  also  a  necessary  condition,  nor  if  the  necessary  condition 
given  above  is  also  sufficient. 

In  1960  Rudin  introduced  another  kind  of  lacunary  set,  the  so-called 
Ap-sets,  and  proved  in  this  connection  that  the  L'’-norms  of  functions  with 
spectrum  in  a  Sidon  set  behave  like  the  U’ -norms  of  a  Rademacher  series, 
namely  O(^)  (p  oo). 

A  breakthrough  in  the  theory  of  Sidon  sets  was  made  by  Drury  (1970) 
when  he  proved  that  a  finite  union  of  Sidon  sets  is  a  Sidon  set;  his  method 
was  to  introduce  Q  =  'X^,  and  harmonic  analysis  on  Q  as  well  as  X. 

The  next  step  was  Rider  1975.  Rider  gives  a  new  characterization  of 
Sidon  sets,  which  can  be  expressed  as 

Ca.s,A==AA.  (11.2) 

Here  Ca.s.  is  the  space  of  functions 

f '^fne'"‘ 


such  that,  for  almost  all  changes  of  signs, 
^±fne”'‘ 


represents  a  continuous  function  (if  this  is  true  for  all  changes  of  signs,  then 
f  e  A).  Actually  Rider  considered  Steinhaus  and  not  Rademacher  series;  we 
now  know  that  it  is  the  same,  and  also  the  same  as  Gaussian  series. 

Finally,  using  (1 1.2),  Pisier  was  able  to  prove  the  converse  of  Rudin's 
theorem  (1978).  It  was  the  most  spectacular  use  of  random  Fourier  series  in 
order  to  study  lacunary  sets.  The  whole  theory  is  expounded  in  the  book  of 
Marcus  and  Pisier  1981. 
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Here  is  another  spectacular  result  of  Pisier  using  Cj  >  (1979),  also  de¬ 
scribed  in  Marcus-Pisier.  Let  us  consider 

CnCa.s. 

that  is,  the  space  of  continuous  functions  on  %  such  that  a  random  change 
of  signs  of  the  Fourier  coefficients  gives  a  continuous  function  a.s..  Pisier 
proves  that  Lipschitz  functions  operate  on  CnCa.s  Asa  consequence  CnC.,> 
is  a  Banach  algebra  which  is  strictly  contained  in  C  and  strictly  contains  A, 
such  that  the  Lipschitz  functions  operate.  This  Pisier  algebra  gives  a  very 
neat  answer  to  a  problem  of  Katznelson  (find  homogeneous  Banach  algebras 
between  C  and  A,  on  which  not  only  analytic  functions  and  not  all  continuous 
functions  operate),  and  it  is  a  remarkable  object  in  harmonic  analysis. 

Let  us  go  back  to  (11.1),  the  definition  of  Sidon  sets,  fora  while.  (11.1) 
expresses  that  Ca  is  isomorphic  to  A  a,  which  in  turn  is  isomorphic  to 
Now,  given  vectors  x,  in  f'. 


for  all  q  5  2  (property  of  cotype  2). 

Conversely,  if  C a  is  isomorphic  to  A  is  a  Sidon  set  (Varopoulos 
1976).  If  Ca  has  cotype  2,  A  is  a  Sidon  set  (Pisier  1978).  If  Ca  has  a  finite 
cotype,  A  is  a  Sidon  set  (Bourgain-Milman  1985,  developed  in  Prignot  1987). 
This  is  the  best  that  we  know  on  the  geometric  properties  of  C  a  iis  a  Banach 
space  which  are  equivalent  to  the  fact  that  A  is  Sidon. 

New  characterizations  of  Sidon  sets  were  given  by  Pisier  (1983)  and 
Bourgain  (1985);  without  answering  the  questions  we  raised  they  are  cur¬ 
rently  the  most  powerful.  Bourgain  proves  the  following  implications 

(1)  =>  12)  (3)  (4)  =i-  (1) 

where 

(1) :  A  is  a  Sidon  set 

(2) :  A  has  the  Rudin  property  on  L*’  norms 

(3) :  A  has  the  Pisier  property,  meaning  that  there  exists  a  6  >  0  such 
that  each  finite  subset  A  of  A  contains  a  quasi-independent  subset  B 
such  that  I  B  |>  6  1  A  I  II  I  is  the  cardinal) 

(4)  (Bourgain's  property);  there  exists  a  6  >  Osuch  that,  given  (a^)^6A, 
vanishing  outside  a  finite  set,  there  exists  a  quasi-independent  A  c  A 
such  that 

^  1  Qa  6  ^  I  ax  I  . 

AeA  AeA 
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Pisier  already  proved  (1)^(2)  as  we  saw,  and  also  (1)0(3)  (see  the  references 
in  the  announcement  Pisier  1983).  Bourgain's  proof  uses  random  sets  of 
integers  in  order  to  get  {2)0(3)  and  an  elementary  and  clever  argument  to 
obtain  (3)0(4).  The  last  implication  (4)0(1)  is  obvious  when  you  are  familiar 
with  Riesz  products. 

Riesz  products  are  of  the  form 

[7(1 +Re(c^c‘^‘))  (113) 

AgA 

where  A  is  a  quasi-independent  set  of  integers,  and  |  Cj^  1  for  all  The 
condition  on  the  coefficients  guarantees  that  the  partial  products  are  positive, 
and  the  quasi-independence  of  A  guarantees  that  their  normalized  L'-norm 
is  1.  When  A  is  quasi-independent,  Riesz  products  provide  a  way  to  express 
abounded  sequence  (CaIa^a  as  the  restriction  to  A  of  the  Fourier  transform 
of  a  bounded  measure.  Here  is  another  way  ^o  express  Bourgain's  property, 
there  exists  6  >  0  such  that,  given  (uAlAgA  with  (  ga  i'J;  6  (A  €  A|,  there 
exists  a  measure  n  in  the  o-convcx  envelope  of  Riesz  products  of  the  form 
(1 1.3),  such  that 

Qa  =  Ua  (A  €  A). 

The  classical  Riesz  products  are  of  the  form 

X 

[Jd  +Re|c,e‘'^''))  (11.41 

I 

where  the  Aj  are  positive  integers,  and 

Aj,,/Ai>3  (i  =  l,2,...).  (11.5) 

The  important  property  of  the  sequence  (Aj )  is  that  it  is  dissociate,  meaning 
that  there  is  at  most  one  way  to  express  any  integer  n  as  a  linear  combina¬ 
tion  ajAj,  a,  -  -1,0,1.  Dissociate  means  more  than  quasi-independent. 
There  is  an  intimate  relation  between  Riesz  products  and  lacunary  trigono¬ 
metric  series  (see  Peyri^re  1991,  Fan  Ai-Hua  1989),  and  Riesz  products  are  a 
mine  for  examples  of  measures  in  Fourier  series,  and,  now,  for  multifractal 
analysis.  Here  is  a  basic  problem:  let  us  fix  the  Aj  and  consider  the  measure 
defined  by  the  Riesz  product  ( 1 1.4)  as  a  function  of  the  sequence  c  =  (Cj ) 
(we  always  suppose  |  Cj  1 ),  denoted  by  p,  -  Given  c  and  c',  is  it  true  that 
either  Pc  Tpt'  or  '  Be'?  How  to  express  conditions  of  orthogonality  or 
equivalence  in  terms  of  c  and  c'?  The  most  promising  result  in  this  direction 
comes  from  Kilmer  and  Saeki  1985.  They  randomize  ( 1 1 .4)  in  the  form 

oc 

[7(1  -f-Re(cie^''“^'e‘'''')) 

1 


(11.6) 
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where  (toj)  is  a  Steinhaus  sequences,  and  get  a  random  measure  ^cu.c-  Given 
c  and  c',  either  a.s.  or  ^^j.c  -  ha»,c'a.s..  Moreover,  the  explicit 

condition  for  the  a.s.  equivalence  can  be  expressed  as 

OO 

^d^(Cj,c'j)  <  OO, 

1 

d  being  the  distance  in  the  disc  given  by 

ds^  =  d0^  +  (1  -  (z  =  re*®). 

It  is  quite  possible  that  this  holds  also  for  (1 1.4)  under  the  assumption  (1 1.5). 
It  can  be  checked  that  it  holds  if  ( 1 1 .5)  is  replaced  by  the  stronger  condition 


(1 1.6)  is  an  example  of  a  product  of  independent  weight  functions  Pjft,  coj ). 
The  general  frame  is  Pjlt.coj)  $  0  and  EPj(t,cui)  =  1  for  every  t.  The 
random  measures  defined  in  this  way  have  interesting  properties  (see  for 
example  Kahane  1989  or  Kahane  1991). 

12.  Divergence  everywhere,  convergence  everywhere, 
and  random  coverings 

Let  me  turn  to  a  topic  which  is  very  much  in  the  spirit  of  P.Z.  We  consider  a 
sequence  of  positive  functions  f„  on  the  circle  T  and  consider  the  series  of 
random  translates 

OO 

CUn). 

1 

For  simplicity  let  us  assume  0  $  fn(t)  ^  1  for  all  n  and  t.  Convergence  and 
divergence  almost  everywhere  create  no  problem;  either  one  is  almost  sure 
according  i  >  the  convergence  or  divergence  of 

OO  - 

y  fn(t)dt. 

^  J-t 

How  to  express  the  almost  sure  divergence  resp.  convergence  on  the  whole 
of  1,  or  the  almost  sure  divergence  resp.  convergence  on  a  given  part  of  T? 

The  question  of  divergence  is  far  from  obvious  even  when  =  1 
and  there  are  considerable  difficulties  for  f„  =  Cn  Let  me  explain  the 

situation  in  both  cases. 
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For  fn  =  1  io,e„i  the  question  is  the  condition  of  almost  sure  covering  of 
1,  or  a  given  part  of  T,  by  random  intervals 

In  =  [0,£n]  +  tUn. 

For,  if  covering  holds  a.s.,  it  holds  infinitely  many  times.  The  question  goes 
back  to  Dvoretzky,  and  there  were  contributions  by  Erdos  (though  no  proof 
was  published),  Billard  and  myself;  methods  and  results  prior  to  1968  are 
expounded  in  the  first  edition  of  SRSF.  Here  is  an  idea  of  what  was  known 
at  this  time: 

■  when  In  —  -^/  covering  of  T  has  probability  0,  and  the  uncovered  set 
has  a.s.  Hausdorff  dimension  e;  subsets  of  T  of  dimension  <  1  —  e  are 
covered,  subsets  of  dimension  >  1  —  e  are  not  covered  a.s.; 

■  when  f  n  =  covering  of  T  is  almost  sure; 

■  when  fn  =  ^  it  was  undecided. 

•  writing 

00 

k(t)  =exp^((n  -t)' 

(I  )+  denoting  the  positive  part), 
f ' 

k(t)dt  <  oo 

.  0 

implies  that  the  probability  of  covering  T  is  0,  and 
r’ 

k(t)u(dt)  <  oo 

.  0 

where  n  is  a  positive  measure,  implies  that  the  probability  of  covering 
the  support  of  g  is  0.  Therefore,  given  a  Borel  set  A  c  'X 

Cap^A  >  0 

implies  that  A  is  not  covered  a.s.. 

The  topic  moved  suddenly  in  1971.  Independently,  B.  Mandelbrot  and 
S.  Orey  solved  the  case  ^ ,  and  Shepp  gave  the  final  answer  for  the 
covering  of  T.  a.s.  covering  holds  if  and  only  if 

[  k(t)dt  =  oo, 

Jo 

and  the  condition  can  be  expressed  in  the  elegant  form 
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assuming,  as  we  can,  fi  ^  ii  ^  ^  ■  ■■  Coverings  of  other  bodies  were 

considered.  A  brief  history  of  the  topic  up  to  1985  is  given  in  the  second 
edition  of  SRSF,  therefore  I  skip  the  references. 

In  1987 1  discovered  at  the  occasion  of  a  course  in  Urbana  that 

Cap^A  =  0 

is  necessary  and  sufficient  for  the  a.s.  covering  of  A  (in  case  the  Lebesgue 
measure  of  A  is  positive,  this  means  k  L')-  Different  proofs  (the  initial 
proof  being  the  shortest)  are  given  in  a  recent  paper  (Kahane  1990)  where 
also  covering  of  'X‘'  by  random  translates  of  homothetic  bodies  is  considered 
(final  results  are  obtained  in  the  case  of  simplexes;  the  cases  of  balls  and 
cubes  are  still  open). 

Let  me  turn  to  the  case  f„  =-  c„l|o,,„;  now,  c„  ^  0.  Of  course,  it  is 
interesting  only  in  the  case  of  covering.  When 

-X. 

7t  ^  f  „  (t  -  ie„  I  ^  oo,  ll2.l! 

1 

it  means  a  kind  of  density  (depending  only  on  (c„  1 1  of  the  sets  A,  of  integers 
n  (depending  on  t )  such  that  t  *:  1„.  This  problem  is  introduced  in  the  thesis 
of  Tan  Ai-Hua  (Orsay,  1989). 

F-an  Ai-Hua  proved,  in  the  ca.se  („  -  ■  I-'- .  that  (12.1 1  has  probabililv 
1  when  c„  -  .  On  the  other  hand  (12.1!  has  probability  (!  when 

H  r  ‘(f  ■'  observation.  There  is  a  large  gap  and  it  does  not 

seem  easy  to  fill  it. 

Here  is  an  addition  in  the  case  („  -  f .  \'ow  ( 12.1 )  has  probability  d 

when 

X 

/  - oo. 

^  n  log  n 

This  can  be  seen  by  integrating  the  series  in  (12.1)  against  a  convenient 
random  measure.  On  the  other  hand,  it  has  probability  1  when  c„  decreases 
and  satisfies 

■X 

^  ift(  n  >  ■  OO , 

1 

where  (p(  1 )  ^2  and  ip(n!  -  2''""“’ '.  The  gap  is  still  larger  in  this  case. 

The  random  measure  to  be  considered  is  associated  with  the  f„  by 
means  of  the  usual  operator  Q  associated  with  a  product  of  independent 
weights  of  the  form 


Pn(t,tu)  =exp(-A„1n,.«„|(t  a-n))/(e„exp(-A„)  +  1  („1, 
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The  choice  of  the  An  has  to  obey  two  conditions:  1)  Q  should  operate  on 
the  Lebesgue  measure  dt,  and  give  a  random  measure  Qjdt);  2)  Q(dtl 
should  integrate  the  series  in  (12.1).  An  exposition  for  random  coverings 
and  operators  Q  associated  with  a  product  of  independent  weights  is  given 
in  (Kahane  1989)  (MR  91  e  60152). 

Convergence  everywhere  is  also  interesting  when  f„  ^  c,,!  t'  ,,  . 
When 

vt  ^fn(t-te„)^  oc-  112.2! 


it  means  a  kind  of  scarcity  of  the  sets  A,  considered  before. 

Let  me  mention  a  remarkable  result  of  Fan  Ai  Hua  (unpublished)  in 
the  case  ^  (a  >  0):  (12.2)  holds  a.s-  whenever  c„  decreases  and 

^  n  'c„  ^  oc'.  On  the  other  hand,  we  know  that  Y.  implies 

^  f„(t  ■  ie„  1  "  oc-  a.e.  a.s. 

Therefore,  assuming  f„  c„l  o.<i  i,;  <<//(/  (Cn)  decreasing,  'c„  •  x 

is  necessary  and  sufficient  in  order  to  have  (12.2)  a.s.  The  monotonicity 
condition  on  (c„)  is  essential,  as  is  clear  bv  considering  lacunary  series  ol 
the  type 

•1 : .  fi  •  n ; ,  .  t 

(Xn  ■  Ijc.t., :  given  such  that  f„  j,  0,and  n,  sparse  enough). 
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a  The  concept  of  empirical  characteristic  functionals  in  certain  sequence 
spaces  is  proposed.  The  convergence  of  the  related  empirical  measures  and 
processes  are  linked  to  the  idea  of  weak  convergence  along  projective  sys¬ 
tems.  A  review  of  multivariate  empirical  characteristic  function  techniques 
is  included.  Some  hints  are  given  for  the  statistical  inference  on  probability 
distributions  in  sequence  spaces. 


1.  Introduction 

The  empirical  characteristic  functions,  following  the  initiation  of  their  sys¬ 
tematic  study  in  the  pioneering  paper  by  Feuerverger  and  Mureika  [9],  have 
proved  to  be  very  efficient  tools  for  stochastic  analysis  and  inference  prob¬ 
lems.  As  many  distributional  properties  such  as  stable  distributions  and 
distributions  in  abstract  spaces  can  be  characterized  solely  by  characteristic 
functions  (or  functionals),  it  is  natural  that  the  inference  problems  related 
to  such  cases  should  be  more  favorably  treated  by  empirical  characteristic 
functions  rather  than  empirical  distribution  functions  or  densities. 

Let  (S.T.T)  be  a  probability  space  and  let  0  =  (0i,  ,0n1  be  an 

J  f->  3”  measurable  mapping  of  S  into  R",  inducing  a  probability  measure 
H„  in  (R"’,23'’).  Then  m  independent  observations  0;  -  (0ji ,  ■  ■  ■ ,  0j„  ), 
(j  =  1,2,  •  ■ ,  m)  of  the  random  (finite)  sequence  0(co)  will  yield  a  tableau  of 
the  following  form: 

dn  di2  dll, 

021  022  02,1 
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so  that  0j(u))  =  (Qji ,  ■  •  •  ,0jn)/  ()  =  1,  -  •  •  ,TTi).  For  increasing  size  of  obser¬ 
vations  and  the  later  reference  to  infinite  sequences,  the  double  array  (1.1) 
has  been  viewed  as  an  expansive  one.  The  empirical  characteristic  function 
is  defined  by 

Xnm(t,tu):=  eXp(l(t-x))dXnm(u>.X) 

Jr" 


1  ^ 

=  —  J^exp(i(t  -  0j(co))) 
^  i  =  i 


m 


^exp(i^ts0js). 

j  —  1  S  -  1 


t  €  R"' 


(1.2) 


where  Anm  is  the  empirical  distribution  associated  with  (1.1),  i.e.: 


1 

Anm(a))  =  —  y”  6(0j(a)))  (1.3) 

(6(x):  concentrated  unit  mass  at  x  €  R"). 

By  the  Glivenko-Cantelli  theorem,  Anm  almost  surely  uniformly  con¬ 
verges  to  pn  on  R".  Furthermore  it  is  well-known  that  on  each  bounded  set 
K  c  Rn  that 

suplXnm(t) -Xn(t)l  -a  Oa.s.  (1.4) 

tes 

where  Xn(t)  =  /ru  exp(i(t  ■  x))  dp„(x).  Furthermore  if  Pn  is  singular  with 
respect  to  Lcbcsgue  measure  on  R",  the  supremum  can  be  taken  over  all 
of  R"  [4,  9).  Several  estimates  have  been  given  for  the  rate  of  convergence 
in  (1 .4),  cf.  (4,  111;  see  also  Section  5. 

More  interesting  and  stronger  modes  of  convergence  occur  in  relation 
to  certain  stochastic  processes  (fields)  associated  with  empirical  characteristic 
functions.  Two  of  the  most  important  ones  are: 

1) 


Ym(t)  =  m^(Xnm(t)-Xn(t))  t  €  R" 


(1.5) 


2) 


Zm{t)  =  m7{IXnm(Sm^t)|^  -  IXn(t)P} 


t  e  R" 


(1.6) 
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_  1 

where  in  2,  Sm  is  the  sample  covariance  matrix  and  Xnm(Sm^t)  is  the  Ma- 
halonobis  transform  of  Xnm/  under  the  assumption  that  the  appropriate 
conditions  exist.  1  should  be  regarded  as  a  complex-valued  stochastic  field. 
Finite  dimensional  distributions  of  the  process  Ym(t)  converge  by  the  multi¬ 
variate  central  limit  theorem  to  those  of  a  complex  valued  n-variate  Gaussian 
random  field  (t),  which  can  be  represented  by  the  stochastic  integral 

Y^''(t)  =  |  exp(i(t  •  x))dB^"(x),  t  e  R", 

(x)  being  an  n-variate  Brownian  bridge  process  associated  with  the  mea¬ 
sure  In  other  words  B^"  is  an  n-variate  Gaussian  process  satisfying  for 
x,y,x'  e  R": 

ElB^^Mx)]  =0 

E[B^''(x)B^"(y)|  =  yn[{x';x' 

$  X  Ay}]  -  y„[{x'  :  x'  $  x}]yn[{x' :  x'  ^  y)] 
lim  B'^"  (xi ,  •  •  •  ,Xn)  =  0,  j  =  l,---,n 

Xj  — oo 

lim  B^^lxi,-  -  .Xn)  =  0. 

(Xl  ,  -.Xn  |-t(00.  -.tM) 


It  is  shown  that  Y^‘''(t)  has  the  same  covariance  structure  as  Ym(t),  i.e., 
E[Y^‘''(t)YMl)l  =xn(t-s)-xn(t)xn(-s)and  EY'‘"(t)  =  0. 

For  T  >  0  let  a  compact  set  Kn  be  given  by  K„  =  [-T,  T]’'.  The  processes 
Ym(t)  induce  probability  measures  in  C^(Kn).  These  measures  will  not 
converge  weakly  to  the  distribution  of  Y^‘''(t)  unless  the  latter  process  has 
continuous  paths.  As  worked  out  in  [4, 12],  Y,„  converges  weakly  to  Y^‘"'  in 
e^(Kn)  if  and  only  if 


4)n(s) 
s(log  |)7 


dS  <  CX5 


(1.7) 


where  v|)^(s)  is  the  non-decreasing  rearrangement  of  (1  -Rexn(t))^. 

To  work  out  the  weak  limit  of  the  process  in  2  is  much  more  difficult  than 
that  of  1.  Under  the  null  hypothesis  "the  measure  Un  is  normal  with  some 
expected  value  vector  and  some  non-singular  covariance  matrix,"  however, 
the  process  becomes 


Zm(t)  =  m^{lxnm(sJt)l'^-e-'“>} 


(1.8) 


and  it  converges  weakly  in  e(  )  to  the  sample  continuous  Gaussian  process 
Z^’'(t)  satisfying  Z*^"]!)  =  Z^‘"(-t),EfZ^‘’'(t)]  =  0  and  having  the  covari¬ 
ance  structure 

p„  =E(Z^‘"(s)Z‘‘"(t)|  "{cosh((s  t))-I-^(s  t)^l,  (1.9) 
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(cf.  [4]).  Z{t)  has  the  further  property  that  Z(s)  and  Z(t)  are  independent  for 
any  pair  of  orthogonal  vectors  s  and  t. 

There  has  been  little  effort  so  far  to  generalize  the  empirical  character¬ 
istic  function  methods  to  infinite  dimensional  spaces,  perhaps  mainly  due 
to  the  fact  that  it  is  difficult  to  find  genuine  examples  whereby  random 
elements  of  such  spaces  can  be  observed.  Feuerverger  and  McDunnough 
[8]  considered  an  extension  to  strictly  stationary  ergodic  time  series  and 
introduced  the  concept  of  poly-characteristic  functions  and  their  empirical 
version.  This  is,  for  fixed  k,  basically  the  characteristic  function  of  the  finite 
dimensional  random  vectors  (0r,0r-i ,  •  -  ,0r-k)  related  to  a  discrete-time 
stationary  process  0  =  (0i ,  02, . . .). 

^apar  in  [3]  attempted  a  further  generalization  to  discrete-time  non¬ 
stationary  processes.  Partially  observed  trajectories  of  such  processes  will 
yield,  in  a  limiting  sense,  random  elements  of  certain  sequence  spaces.  In 
Sections  2  and  3  we  outline  the  formalism  and  the  basic  properties  related  to 
such  a  generalization. 


2.  Empirical  characteristic  functionals 
in  sequence  spaces  and  related  properties 

Let  E  be  a  real  sequence  space  with  a  specified  topology  and  let  Be  be  its 
Borel  ff-field.  The  characteristic  functional  of  a  probability  distribution  q  on 
(E,®e),  is  given  by 


X“(f):= 


exp(i  <  f,x  >)dq(x),f  €  F 
t 


Here  F  will  be 


1)  the  sequence  space  G  if  E  satisfies  E  =  G*  ((■)*:  continuous  dual) 

2)  E*  if  1  does  not  hold. 

Some  examples  of  (E,F)  pairs  would  be  (R"^,  Rp  ),  (li  ,co),  (Ic^,  li ),  (Ip,  Iq), 
(co,  li )  etc.  (Rp  :  the  space  of  all  sequences  with  finite  length,  cp  =  [x  e  R'^  : 
limj  Xj  =  0],  1/p  4-  1/q  =  1). 

If  E  is  reflexive,  1  and  2  yield  the  same  F.  Also  for  certain  E  spaces, 
the  values  of  x“  on  G  may  uniquely  determine  its  values  on  the  whole  E*, 
(e.g.,  E  =  li).  The  canonical  projection  onto  the  first  n  coordinates  will 
be  indicated  by  Tip.  The  projection  of  q  on  (R",®")  and  its  restriction  to 
7Tp’(®")  are  denoted  by  q„  and  Hn  respectively.  A  superscript  (  I®  on  a 
finite  sequence  will  indicate  augmentation  to  an  infinite  sequence  by  filling 
out  the  rest  of  the  positions  by  zero. 

Now  let  0  be  an  T  44  Dt  measurable  mapping  of  S  into  E,  or  more 
generally  into  R"^  (the  space  of  all  real  sequences),  such  that  one  of  the  suffi¬ 
cient  conditions  for  the  induced  probability  measure  to  be  concentrated  on 


{  521  Empirical  characteristic  functional  analysis  and  inference  in  sequence  spaces  } 


the  particular  sequence  space  is  satisfied.  We  may  suppose  that  the  random 
sequence  ©(m)  is  generated  by  a  non-stationary  random  process  or  by  any 
other  source  which  can  be  observed  any  number  of  times  independently 
under  identical  conditions.  Such  observations  will  yield  a  double  array  of 
the  same  form  as  ( 1 .1 ),  where  the  rows  represent  the  observed  components 
of  independent  random  sequences  0,n(uj)  (m  =  1,2,...).  By  assumption, 
0  and  ©m  (rn  -  1,2, . . .)  induce  the  same  probability  distribution,  say  y., 
on  (E,St)- 

Definition  2.1.  The  empirical  characteristic  distribution  A„,„  associated 
with  (I. I)  is  the  random  probability  measure 

1 

m  ^ — 

1  I 

defined  on  (E,n“'(®''))  and  concentrated  on  m  atoms  7i“’7Tn0i(ii.'),  for 
(j  =  1,  ■  ■  • ,  m).  (Thus  Anm  1=  '^nAnm  >s  Concentrated  on  m  points  in  R".) 


Definition  2.2.  The  empirical  characteristic  functional  (e.c.f].)  Xnm  associ¬ 
ated  with  ( 1 .1 )  is  defined  as: 


X  n  tn  ( f  t  to )  :  — 


exp(i((7Tnf)°.x))An,n(a',  dx). 


f  e  1' 


The  e.c.fl.  can  alternatively  be  expressed  as: 


X  Ti  m  ( f )  ) 


exp(i((7Tnf)  •  y)lA 

nm  (co.dy) 

K" 


tn 


^expd  ^  fkBit,! 
i  I  k  1 


where  (.,.)  denotes  the  bilinear  form  of  the  E,F  duality.  In  the  following 
Glivenko-Cantelli  type  theorem  Xn(f)  is  the  characteristic  functional  of  the 
projected  destribution,  i.e.: 


Xn(f) 


exp(i  <  (7t„f)°.x  >)dn„(x) 
t 


exp[i((7inf)  ■  yllunldy) 


f  e  F.  (2.1) 


Theorem  2.3. 

i:  limn-too  Xn(f)  =  X“(f),  for  f  e  F.  The  convergence  is  uniform  on 
compact  subsets  of  F  if  (E,  F) :  ,  Rp  )  or  (li,Co)  or  (1,,,  Iq  ),  (p  $  1 ). 

(Rp  :  The  space  of  all  sequences  with  finite  length), 
ii:  limn,m-*oo  Xnm(f)  =  x'‘(f)a3.,  if  |E,F)  =  R'^,R^)  or  if  0  is  a.s. 

bounded  and  (E,F)  =  (co,  li ),  dp, Iq ), (loo. li  )■ 
iii:  The  convergence  in  ii)  is  uniform  on  compact  subsets  of  F  if  ( E ,  F )  = 
(R^.R^)or(lp,lq). 


Proof.  [2, 3]  I 

The  study  of  convergence  of  empirical  measures  in  sequence  spaces 
and  convergence  of  measures  induced  by  the  empirical  process  create  some 
difficulty.  For  Xnm  (or  Anm)  and  p  are  not  defined  on  a  common  underlying 
measurable  space.  There  is  a  similar  situation  for  the  measures  induced 
by  the  finite  dimensional  empirical  processes  and  the  distribution  of  the 
limiting  process. 

In  such  instances  the  abstract  concept  of  'weak  convergence  along  a 
projective  system',  which  is  outlined  in  the  next  section,  has  proved  to  be 
useful.  Alternative  treatments  via  cylindrical  measures  or  set  martingales  in 
the  limit  can  also  be  given. 

3.  Weak  convergence  of  probability  measures 
along  a  projective  system 

Following  the  notation  and  the  terminology  of  [15],  we  consider  projective 
systems  of  Hausdorff  topological  spaces  of  the  form 

having  the  projective  limit  Q  =  lim(Qa,’napl  with  continuous  canonical 
mappings  7X„  :  O  -?  Da-  The  right-filtering  partially  ordered  set  D  and  all 
other  symbols  are  assumed  to  have  their  usual  meaning  and  properties. 

In  relation  with  such  a  projective  system  we  consider  two  hypotheses: 

Hypothesis  R r.  tt"  ' ,  ( a  6  D )  commutes  with  the  operation  of  forming 
the  rim,  i.e.,  tt"'  (rA)  =  r(7r“' A),  where  r(A)  =  A  n  A*- . 

Hypothesis  R^:  For  every  a  e  D.na®  c  holds,  where  "B  and  "Ba 
are  Borel  o-fields  in  Cl  and  Ga  respectively,  the  former  being  with 
respect  to  the  projective  limit  topology. 

Hypothesis  R|  is  satisfied  by  many  important  projective  systems  in¬ 
cluding  those  where  each  Tta(a  e  D)  is  an  open  mapping.  (In  this  case 
Hypothesis  Ri  is  actually  equivalent  to  the  stronger  property  7r“'(bA)  = 
b(7i“' A)(b(  )  =  boundary).  This  would  be  the  case  if  for  instance  the  pro¬ 
jective  limit  topology  coincides  with  the  product  topology. 

Hypothesis  R2  would  be  ensured,  e.g.,  if  Q  and  Qa(«  6  D)  are  Polish 
spaces  and  B  and  ®  a  are  replaced  by  a-fields  of  subsets  which  are  measurable 
for  the  completion  of  probability  measures  on  Borel  sets,  thus  containing 
anaytic  sets.  (cf.  [7,  pp.391]). 

Definition  3.1.  Let  T  =  {(na,7rap)„^p  :  a,  |3  6  D}  be  a  projective  system 
of  metrizable  spaces  with  the  projective  limit  Q  =  lim(Qo,7i„(5)  furnished 
with  the  projective  limit  topology  and  let  the  a-fields  Ba  and  B,  in  Q„  and 
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n  respectively,  satisfy  Hypothesis  R2.  If  {pa.a  €  D}  is  a  net  of  probability 
measures  defined  on  measurable  spaces  (D„,Ba),a  e  D,  we  say  that  Ua 
converges  weakly  along  the  projective  system  9  to  a  probability  measure  ^ 
on  3  denoted  by  pa  p,  if  for  every  p-continuity  set  B  €  ® 

lim  PalrtaB)  =  p(B) 

a 

holds. 

Note.  In  the  case  of  a  Polish  projective  limit,  3  and  3  a  niay  be  chosen  as 
the  a-fields  obtained  by  the  completion  of  Borel  probability  measures  p  and 
rua(p)  re/oectively. 

The  following  version  of  Alexandroff's  second  theorem  is  valid  for  this 
type  of  convergence. 

Theorem  3.2.  Let  {pa ,  a  €  D}  be  a  net  of  probability  measures  on  a  projective 
system  9  as  described  in  Definition  3.1,  satisfying  further  hypothesis  Rj.  If 
n  is  sufficiently  rich,  (i.e.,  rtaO  =  O  aland  pis  tight  on  Q,  then  the  following 
are  equivalent; 

1)  Pa  p,a  €  D. 

2)  Let  f  €  Cjflpj.p  €  D  and  if  a  €  D,a  ^  3/  let  and  be  the  lifts 
of  f  to  Da  and  H  respectively,  (i.e.,  f“(x)  =  f(7tpax),  f^^x)  =  flrcpx)). 
Then  lim^  ^  j3Pa(f“)  =  p{f“). 

3)  The  same  conclusion  as  in  2,  CIQp)  being  replaced  by  the  set  of 
bounded  uniformly  continuous  functions. 

4)  For  3  €  D,  let  F  €  ip  be  a  closed  subset  of  Dp,  then 

lim  suppa(n-J^F)  $  p(np'F). 
a  ^  3 

5)  For  3  €  D,  let  G  €  !Bp  be  an  open  subset  of  Dp,  then 

Im^infpa(TTp^G)  ^  pln^'G). 


Proof.  [2].  I 

We  can  also  state  the  following  tightness  versus  relative  compactness 
type  result. 

Theorem  3.3.  Let  the  projective  system  9  as  described  in  Definition  3.1 .  have 
a  separable,  metrizable  projective  limit.  Further  assume  that  for  each  e  >  0, 
there  exists  a  compact  subset  Kc  of  D  such  that  PalnaKc )  ^  1  -  e  for  every 
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a  €  D  is  satisfied  by  a  net  {|4al„gD  of  probability  measures.  Then  1  ]  has  a 

subnet  converging  along  the  projective  system  IP. 

Proof.  Let  Ha  betheimageof  Ha  on  7t“'lB„,i.e.,  |4a07T”’  =  and  let  u^(a  t 
D)  be  any  set  of  extensions  of  |4a  to  (D,®).  Such  extensions  always  exist  but 
may  consist  of  measures  which  are  only  finitely  additive.  On  the  other  hand 
O  can  be  imbedded  topologically  into  a  compact  metric  space  O.  For  any 
U* ,  let  nia  be  the  measure  on  H  defined  by  rndB)  =  n  O)  for  all  Borel 
subsets  of  Cl.  The  net  [ma]  has  a  subnet,  say  imN.laeD  converging  weakly 
to  a  cr-additive  measure  v  on  O.  For  any  index  a,  let  C^.a  =  7T“’7iaK,, 
which  is  compact  in  Q.  Let  |3  be  a  fixed  index,  then  by  the  ordinary  weak 
convergence  of  measures  and  the  fact  that  j; 

v(Ci  ,p)  ^  lim  sup  tun.,  (Ct.p)  lim  sup  mN,.  (C^.n.,  ) 

a  i  |J  ,  «  c  b 

=  lim  supMN^{rrN..Ki )  ^  1  -  e. 

a  >:  |i 

By  considering  a  sequence  e,,  i  0,  this  set  of  inequalities  implies  along  the 
same  line  as  in  the  proof  of  Theorem  6.7  of  [14],  that  there  exists  a  measure 
u  on  O  such  that  v(B)  =  uiB  nO)  for  any  Borel  set  B  c  D.  Let  now  F  be  any 
closed  subset  of  Qp.  There  exists  a  closed  set  D  in  D  such  that  rip '  F  ^  D  O. 

As  ruN.,  -4  V  on  O,  we  have  lim  sup^  tnN.,(D)  S:  v(Dl.  This  is  the  same 
thing  as  stating  lim  sup„  tdrxp’F).  Now  for  N,,  -  |.L 

lim  sup  _  (rip’  F)  lim  sup  un,. 

a  a 

=  lim  supuN  .  (npiJ,  Fl^^uInp'Fl. 

Then  by  part  3  of  Theorem  3.2,  un^  g.  | 

In  the  definition  of  convergence  along  projective  systems,  measures 
can  be  replaced  by  measurable  mappings.  Thus  if  IS.T,  P|  is  a  probability 
space,  IP  a  projective  system  and  Uc.  :  S  — >  0„(«  e  Dj  is  a  net  of 
measurable  mappings  we  say  that  U„  converges  to  an  IF  *  *  it  mei.:,urable 
mapping  U  :  S  ->  Q  along  the  projective  system  IP,  and  let  lim„  U^,  ''  '  U  if 
the  probability  measures  Ua  induced  by  Uo  on  (Q„,®„)  converge  weakly 
along  IP  to  the  measure  induced  by  U  on  K),'B1. 

4.  Weak  convergence  of  empirical  measures 
and  processes  in  the  sequence  spaces 

Returning  to  the  probabilistic  scheme  and  terminology  of  Section  2,  we  can 
describe  the  convergence  of  empirical  measures  as  follows. 


{  525  Empirical  characteristic  functional  anali/sis  and  inference  in  sequence  spaces  } 


Theorem  4.1.  Let  be  a  probability  measure  induced  on  by  0  and  let 
(R'',®",  Un)  and  (R'',!B'\Arm )  extend  overall  analytic  sets  (e.g.,  obtained 

by  completion).  If  m,,  — >  oo  as  n  — >  oo,  then  X„,n„  ua.s.  along  the 
projective  system  T  =  ((R"  ,7t„,  .  :  ni  .n2  E  N'l- 


Proof.  Let  X*  (n  =  1 ,2, ...)  be  a  set  of  arbitrary  extensions  of  A„,„„  from 
7:“'  (13" )  to  13|^n  (such  extensions  always  exist).  Since  Xnm„  (f)  depend  only 
on  finite  dimensional  restrictions  of  measures,  except  on  a  null  set  vve  have 
by  Theorem  2.3, 


lim 

n— +CC 


exp(i  <  f, X  >  (u), dx) 


LR^ 


By  the  analogue  of  Levy-Cramer  theorem  (cf.  (17],  Theorem  1.2.8.)  A',„, 
(.ia.s.  as  n  ->  oo.  If  B  is  a  finite  dimensional  set,  then  for  large  n  : 
7T,;'7t,,B  ---  B,A„,u„(7t„B)  -  A;,,„  (B)  and  the  conclusion  follows  imme¬ 
diately.  If  B  is  an  infinite  dimensional  set,  it  will  have  no  interior  with 
respect  to  Tikhonov's  topology,  thus  ulB)  0.  As  '7x;;’n„  B'  is  a  decreasing 
sequence  of  universally  measurable  sets,  letting  C  =  t  n,; 't:,,  B,  we  have 
B  C  C  Hn  --  Band  thus  lim  u(n;, 'inBI  i‘IC)  0.  There 

exists  a  sequence  i  '  of  positive  integers  and  a  decreasing  sequence  !  C\ !  of 
U-contitiuity  sets  satisfying: 

1)  C\  B 

2)  limk,-,x  M(k\l  0 


The  double  limit  lim„,i,.,^  A‘,,„  (Cklexi.stsand  is  equal  to  zero,  for 
there  arc  positive  integers  k^'  and  no  such  that  for  k  -•  ko  and  n  >  uo  we 
have  'u(Ck)  *  ii(B)l  -■  ulCkl  ^  and  IA;„,,,  (Ck.. )  u(Ck,-)'  <  j,  there¬ 
fore  a;,,,,  (Ck,,l  ulB)!  c.  But  a;,,,,.  (Ck)  A;,,„,  (Ck..l  and  uiB)  - 

0.  Therefore  limk  •  ^ '^k,,,.  (L'c  I  0  and  this  implies  A;„„  (7x,,  '7t„Bi 

A„„,  (n„Bl  *0  ii(B)a..s.  | 

In  order  to  suppress  path  dependence  of  the  convergence  consider 
A„  as  random  measures,  i  e.,  as  measurable  mappings  S  *  M„,  where  M„ 
denotes  the  set  i>f  all  probability  measures  on  (R"  ,13" ).  Let  also  the  set  of  all 
probability  measures  i>n  ( R*^ ,  )  be  denoted  by  M.  Then 


Theorem  4.2.  Fmr  every  sequence  m,,  of  positive  integers  tending  to  infinity 

w  'P 

as  n  *  oc,  we  have  lim,,  A,,,,,.,  A„  where  A„  :  S  >  M  is  the 

measurable  mapping  with  distribution  degenerate  at  u  L  M. 


L'oof.  3]  I 
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Note.  Another  viewpoint  for  the  convergence  of  measures  in  Theorem  4.1 
and  Theorem  4.2  could  be  suggested  via  the  concept  of  set  martingale  in  the 
limit.  Let 

~  >  P«i  )o:<P><*.  3  €  D} 

be  a  projective  system  of  probability  measures  having  the  property  of  sequen¬ 
tial  maximality.  It  is  well-known  that  with  such  a  system  there  is  associated 
a  set  martingale  [15,  Section  3.1,  Proposition  7]).  In  parallel  to  Blake's  defini¬ 
tion  of  a  'weak  martingale  in  the  limit'  [11,  one  can  introduce  the  concept  of 
a  'weak  set  martingale  in  the  limit;  for  a  system  )  :  m  =  1 ,2, 

of  probability  spaces  if  v,no  ^  P®  as  m  ->  oo(a  €  D),  where  {P„,a  e  D] 
is  a  set  martingale  with  base  (La, a  6  D}.  Then  Theorem  4.1  can  be 
rephrased  as:  {(Anm„,R'',51'')  :  m  =  1,2, ...J  is  a  set  martingale  in  the 
limit  with  the  underlying  set  martingale  [pn  :  n.  =  1,2,..}  which  has  the  base 
{7i-'3":n  =  1,2,...}. 

Now  the  empirical  process  given  by  (1.5)  and  (1.6)  are  modified  for 
sequence  spaces  as: 

Vnm(f)  :=  mj(Xnm(7tnf)  -Xli(7tnf)).f  €  F  (4.1) 

Zn,n(f)  mi{IXnm(Snm(rrnf))!^  -  IxS )i^ I. F  €  F.  (4.2) 

These  should  be  regarded  as  processes  ((4.1)  is  complex-valued  and  (4.2) 
real)  with  a  generalized  index  set.  (Here  F  is  in  general  a  topological  vector 
space).  Like  in  the  ordinary  cases,  as  given  by  (1.5)  and  (1.6),  the  weak 
convergence  of  processes  should  be  restricted  to  compact  subsets.  Now  if 
(E,  F)  =  (R*^ ,  ),  the  compact  sets  in  F  consist  of  elements  that  have  their 

lengths  bounded  by  a  common  integer,  say  d,  and  their  first  d  coordinates 
determining  a  compact  set  in  R^*.  Then  for  n  ^  d 

Yn,„(f|  =  mJ(x„,„(f)  -Xd(f))  (4.3) 

where  f  runs  over  a  compact  subset  K  of  R^  ^  (set  of  sequences  with  lengths 
bounded  by  d).  Since  for  all  n  ^  d,x,i,„(f)  is  essentially  the  same  as  the 
empirical  characteristic  function  obtained  by  m  independent  observations  of 
the  d-long  random  vector,  (4.3)  is  equivalent  to  a  d-dimensional  multivariate 
process  as  outlined  in  the  introduction.  In  particular  if  K  h  K,i,  then  the 
conclusion  regarding  the  weak  conergence  in  C^lKd )  will  be  valid  under  the 
condition  (1.7).  Note  that  this  condition  is  satisfied  uniformly  in  d  if 

log'(||0(cu)||2)'*‘dP(a.)<oo 
Jn 

for  small  e.  (cf.  [5]). 


(4.4) 
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Similarly  the  multivariate  results  can  be  applied  to  the  process  (4.2) 
whenever  K  =  Ka  C  Rq  =  F- 

For  fixed  n  and  arbitrary  dual  pairs  (E,F),  the  process  (4.1)  (resp. 
(4.2))  can  be  restricted,  in  view  of  (4.3)  to  compact  (bounded)  subsets  of 
thus  converges  weakly  in  C^(Kn)(resp.C(Kn))  to  the  Gaussian  processes  as 
described  in  the  introduction.  (In  (4.2)  xH  is  replaced  by  e~^'>  ). 

The  measures  induced  by  the  processes  Ynm  (or  Znm)  are  not  defined 
on  a  common  underlying  measurable  space,  neither  do  they  form  an  inverse 
system  since  they  are  not  necessarily  compatible.  We  investigate  their  con¬ 
vergence  behavior  within  the  concept  of  weak  convergence  along  projective 
systems  as  described  in  Section  3.  For  the  multicubes  Kn  =  I— T,  T]",  (T  >  0), 
we  define  projection  mappings  Yn,  ni  :  C(Kn,  )(ni  <  112 )  by: 

(Yn,n2f)(Xl,-".Xn,)  =  f(Xl.---,X„,,0,---,0)  VfeCjKn,)  (4.5) 

The  mappings  Yn,  n:  are  continuous  and  satisfy  o  Tr^.n,  =  for 
nt  <  n2  <  n.i.  (For  complex-valued  functions  Ym  n..  :e^(K„J  -»  0^(Kn,)). 
On  the  other  hand  if  A  is  a  measure  on  (e(K2),(Be(K2)).Yn,  uj  (A)  is  defined 
as  the  ordinary  image  measure. 

Now  reconsider  the  processes  Ynm  and  Znm-  the  former  being  given 
by  (4.1 )  and  the  latter  by: 

Znm  =millXnm(S;:^(nnf))l^-e-^>  F  (4.6) 

For  fixed  n  they  converge  weakly  in  C(Kn)  and  C^(Kn)  to  the  centered 
Gaussian  processes  Y“’’  and  Z*'"  having  the  covariance  structures 

EfY“''(f)Y^^]  =  xJ;(F-g)-x;:(f)x;:(-g)  f.gef  (4.7) 

E[z^‘'>(f)Z‘->(g)l  =4e-^'  ft-i:'gf;cosh(^fjgj)  -  1  -  ^(^figi)^’ 

1  1 

f,g€F  (4.8) 

provided  that  conditions  ( 1 .7)  or  (4.4)  are  satisfied. 

Let  and  be  the  probability  measures  representing  the  distribu¬ 
tions  of  the  limiting  processes  Y*'”  and  Z""  respectively. 

Theorem  4.3. 

1) 

=  {(e^(K„),a3e2,K„,.vj;,Yn,n2)n,<ni:ni,n2  €  N) 

=  {(e(Kn),®e(K„).'V^,Yn.n2)n,<n,;ni,n2  €  N) 

are  topological  projective  systems  of  Gaussian  probability  spaces. 
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2)  The  projective  limits  (e-(Koo),‘i3e2(K,<,i.''''^)  (e{Koo),3«(K„i,"v^l 

of  the  projective  systems  in  1  exist  and  are  unique.  (K^c:  product  of 
infinite  copies  of  f-T,  Tl). 

3)  Let  either  the  condition  (1.7)  (for  every  n)  or  (4.4)  be  satisfied.  Thenfor 
any  sequence  nin  such  that  mn  — >  oo  as  n  — >  oo,  the  process  Yn  m  „  has  a 
subsequence  converging  weakly  to  "v^  along  the  projective  systems  . 
Furthermore  for  such  a  sequence  the  process  has  a  subsequence 
converging  weakly  to  along 


Proof. 

1)  As  Yn,  n.  are  continuous,  we  only  need  to  show  that  the  measures  v), 
and  are  compatible  with  respect  to  the  mappings  y,, ,  n:  •  We  restrict 
the  proof  to  v^,  a  similar  argument  applies  to  We  have  to  prove 
°Yni'n;  for  n.|  <  The  measure  on  the  righthand-side  is 
obtained  by  the  extension  of  family  of  distributions  of  the  type 

;£(Z“"Mg"').---Z'‘".'(g'*’))l  (4.91 

where  L  (  I  denotes  the  law  of  a  random  vector  and 

g"'  =  (g‘,''.---,gl,\'.o,  --,i))  e  e(K„,), 

(i  =  1 . s;s  =  1.2....). 


But  (4.9)  is  Gaussian  and  completely  determined  by  the  covariance 
structure  given  by  (4.8).  Also  the  right-hand  side  of  (4.8)  (as  well  as  of 
(4.7))  has  the  property  that  projections  on  lesser  dimensions  yield  the 
same  type  of  expressions.  Thus  we  have 

£fZM.,,(glil)2U,.,(glil),  ^ 


which  implies  and  v^.  o  y~,'„,  are  obtained  by  the  extension  of 
the  same  family  of  finite  dimensional  distributions,  hence  they  should 
be  equal. 

2)  Letyn  :(7(K..^)  ->  e(K„)  be  defined  as 


(Yt.gKx)  =  glx^l.g  t  eiK:..: ).  x  €  R", 


y,,  verifies  the  relation  y„,  =  y„,  °Ym'  for  ni  <  and  has  the  three 
properties:  1)  linear;  2)  g  e  e(K^xj),g  0  =?>  3n,  such  that  ynO  0; 
,y;;'(0)  ={0). 

Then  by  [16,  Proposition  11,  pp.  84]  the  projective  limit  topology 
of  C(K:.c,)  has  a  base  of  closed  neighborhoods  consisting  of  the  finite 
intersections  of  the  sets  y~'(Bt.n),c  >  0,  (n  =  l,2,...),B,,n  being 
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closed  balls  in  C(Kn)-  This  is  usually  coarser  than  the  norm  topology 
of  6(Koo  )■  But  if  Vt  is  a  closed  ball  in  the  norm  topology  of  £(!<:«; ),  as 
Vc  =  n- ,  Yn'  (Bt.n)/  the  projective  limit  topology  generates  the  same 
CT-field  in  e(Koo).  Since  each  measure  is  Radon  on  (e(K„),!B(.-(K.,  i ) 
and  the  projective  system  obviously  satisfies  the  condition  of  sequential 
maximality  the  conclusion  follows  from  [15,  Theorem  5,  pp.  121].  The 
proof  for  the  other  process  is  the  same. 

3)  Again  restricting  our  discussion  to  Zniii„/  we  first  note  YnldKoc)!  = 
6(Kn),  n=  1 ,2, ...  is  a  direct  result  of  Tietze's  extension  theorem.  Let 
be  the  image  of  on  =  y"' (®e(K„)). i  e.fvfj*  o  y-'  =  v^. 
Since  the  system  in  1)  admits  a  projective  limit  as  given  in  2),  there 
exist  a-extensions  of  to  such  that 

strongly,  i.e.,  IKv^)  —  "v^ll  — )  0,  where  the  norm  denotes  the  to¬ 
tal  variation,  (cf.  [15,  Theorem  8,  pp.  134]).  Therefore  the  family 
[(■Vn)  j.  (n.  =  1,2,...]  is  tight,  implying  that  given  e  >  0,  there  ex¬ 
ists  a  compact  set  Kc  e  (which  is  also  compact  in  the  projective 

limit  topology)  such  that  (Kt)  $  1  ~  2-  denote  the 

measure  induced  on  CjKn  j  by  the  process  Znm„  •  Since  Vn,„„  -4  as 
n  c»,  in  view  of  nondegeneracy  of  there  exists  no  such  that  for 
n  ^  nc,!-v'n,„„(ynKJ  -v^(y„Kc)(  <  f.  On  the  other  hand  we  have 

■VnlYnKJ  =  (-Vnl’lYn'YnK,  1  5  (vyj'lKc)  ^  1  -  2'  This,  along  with 
the  previous  inequality  implies  that  Vn„,„  (y,,  )  $  1  -  c,  for  n  >  no- 

Now  the  conclusion  follows  from  Theorem  3.3.  A  parallel  argument 
applies  to  the  process  Y„,„„. 


Remark  4.4.  It  can  be  shown  that  Hypothesis  R|  of  Theorems  3.2  and  3.3  is 
verified  for  the  above  projective  systems.  Hypothesis  R2  is  assumed  to  have 
been  taken  care  of  as  in  Theorem  4.1. 


Remark  4.5.  Without  the  null  hypothesis  on  normality,  the  limiting  distri¬ 
butions  of  the  processes  Zn,n  will  not  in  general  form  a  projective  system 
with  y„,  n,  as  functional  morphisms.  For  instance,  for  arbitrary  u,,  distri¬ 
butions  and  under  some  additional  assumptions  of  independence  and  finite 
fourth  moments,  is  found  to  be  Gaussian  on  C(Kn)  with  the  following 
covariances  (cf.  [6]): 

Efz>‘”(f)Z“"(g)l  =2Re{x;;(-f)xS(-g)p(^g)  +  Xn(nx;;(-9)p(-f.g)’M 
f.geR" 


{  (^apar 
where 
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p(f,g)=xt;{f  +  g)-x5i(f)xt;(g) 

+  ;^{fv^xS(f)Vx;;(g)  +  gv"xS(g)Vx;;(f) 

+  x;i(g)[f(Vx;i(g)]  +xK(f)[gVxli(g)]l  + ... 

It  is  clear  that  taking  projections  will  not  give,  in  the  presence  of  Laplacian 
factors,  the  covariance  structure  of  lower  dimensions. 

5.  On  inference  problems 

In  view  of  the  convergence  of  characteristic  function(al)s  and  weak  conver¬ 
gence  properties  of  the  related  processes,  different  functionals  of  empirical 
processes  lend  themselves  as  potential  statistics  in  inference  problems.  Some 
examples  are: 

1)  Univariate  distributions 

a)  For  testing  the  symmetry  about  the  origin  in  univariate  distri¬ 
butions,  the  statistic  =  J,jflTaxim(t)l^dG{t)  is  suggested  [9]. 
Here  G  is  taken  to  be  a  distribution  function  symmetric  about 
the  origin.  When  the  center  of  symmetry  is  specified,  Tm  can  be 
modified  as  inf  f[Im.{e'“‘xim(t)}l^  dG(t). 

a 

b)  For  simple  goodness  of  fit  Rm  =  \/mmax(lxirn(ti)  -  xo(tj  )IJ  = 

where  xo  is  a  specified  univariate  characteristic  func¬ 
tion,  is  studied  in  [10].  A  Cram6r-von  Mises-type  statistic  = 
^ro.ix  im(t)  ~  Xo(t)l^  dw(t),  with  w  being  some  weight  func¬ 
tion  on  the  line,  can  be  used  for  the  same  purpose  [11]. 

c)  For  testing  normality  (with  some  mean  and  variance)  in  one 
dimension,  Murota-Takeuchi  [13]  proposes  ^m(t),(n  =  1), 
(cf.  (1.6)),  evaluated  at  some  point  selected  in  an  interval  [-T,  T], 
i.e.,  Z,„(t)  =  0n{lx)m(:^]l^  - 

2)  Multivariate  distributions 

a)  K-sampIe  homogeneity:  Let  0^’’. . . . . ,  j  =  1 ,  •  •  ,  K,  K  ^  1  be 
a  set  of  K  independent  observations  with  sizes  m.) ,  ■  ■  ,  mK  of  K  n- 
dimensional  random  vectors  0*”,...,0*'^'  with  corresponding 
characteristic  functions  x‘'’(t)....,x*'‘'’(t).  Let  Xnmi(t)  be  the 
empirical  characteristic  function  of  the  j-th  sample.  Then  the 
characteristic  homogeneity  process 

K 

SN(t)  =  £aj(N)ymixI,‘i,,(t) 
i  1 
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where  N  =  (mi,-  ■  •  ,mj)  and  aj(N)  are  properly  selected  con¬ 
stants,  can  be  used  to  test  homogeneity  [5}. 

b)  Independence:  For 

n 

33  ^  2)Stn(t)  =  \/m((Xnm  (t)  Xnm,k(tlc)), 

k=1 

t  =  (tj  ,  .  .  .  ,  tji), 

(the  empirical  characteristic  independence  process)  is  proposed 
to  test  the  independence  of  the  components  of  ©.(Xnm,k(tk)  = 
Xnm(0,tk,0):  the  empirical  characteristic  function  of  the  k-th 
components)  [5], 

c)  Testing  for  normality  in  arbitrary  dimensions;  Various  extensions 
of  n-dimensional  Murota-Takeuchi  statistic,  as  given  by  (1.6)  or 
(1.8),  can  be  proposed. 

i)  Consider  nonzero  vectors  in  a  neighborhood 

of  the  origin,  such  that  the  L  x  L  matrix  [pti.t,]  be  non¬ 
singular.  (Ps.t  is  given  by  (1.9)).  Then  the  quadratic 
form  Qm  =  Qm(ti,  -.tL)  =  2;„R“'z„.,  where  = 
(rm(t)),  -  ,Zm(tL))  with  an  asymptotic  chi-square  distri¬ 
bution  of  L  degrees  of  freedom  under  the  null-hypothesis, 
can  be  used  to  test  normality. 

ii)  For  nonzero  pairwise  orthogonal  n-dimensional  vectors 

ti,--,t„ 

=max{|Z,„(t,)|.--,|Z,n(t,„)|} 

forms  another  extension  of  the  Murota-Takeuchi  statistic. 

iii) 

Mi:'(T)=  sup  |Z„,(t)| 

tel-T.Ti'' 

=  sup  IIXnm(Sm'^t)|^  - 

lel-T.TI" 

is  a  refinement  of  ii.  Procedures  to  estimate  or  approximate 
the  critical  tail  values  of  ii  and  iii  are  suggested  in  [6]. 


Any  kind  of  inferential  study  regarding  the  unkown  distributions  on  se¬ 
quence  spaces,  such  as  the  distribution  induced  by  a  non-stationary  process 
has  to  be  carried  out  in  finite  dimensions,  with  some  large  n.  The  rationale 
behind  doing  this  by  utilizing  the  empirical  characteristic  functionals  and 
the  related  processes,  is  provided  by  Theorems  2.3  and  4.3.  According  to 
Theorem  2.3,  limn-.oo  x!i(^)  =  €  F  and  the  convergence  is  uniform 


on  compact  subsets  of  F  for  certain  dual  pairs  (E, F),  including  (Ip ,  E, ).  On 
compact  subsets  of  Iq  Ffblder's  inequality  yields: 


lxK(f)-x^‘(f)l  $ 


|exp{i[M(  ^  |xk,  J  -  l|dn(x) 


(5. 


k  -n  f  1 


for  some  constant  M  >  0.  (Similar  inequalities  hold  for  other  dualities). 

In  some  special  cases  (5.1)  can  be  utilized  (at  least  in  principle)  to 
approximate  x‘‘(f)  uniformly  on  compact  subsets  of  F.  Two  such  cases 
would  be; 


1)  The  image  of  0  in  Ip  is  almost  surely  contained  in  some  particular 
compact  set,  e.g.,  subsets  of  the  type  Km.ic  =  [x  :  x  e  Ip,  IXnP’n.'"  $ 
M,n  =  1,2,  ...;lc  >  1} 

2)  n  is  a  Gaussian  distribution  with  a  given  covariance  operator.  If  for 
some  rio.n  ^  no  =4'  Ixn(f)  -  X^‘(^)l  <  G  then  functionals  of  the  empir¬ 
ical  processes  indexed  by  R"®  and  given  under  1  and  2  above,  can  be 
used  for  different  inference  problems. 


The  following  iterated  logarithm  result  [4]  may  give  a  clue  for  the  right 
sample  size  in  each  dimension: 


lim  sup 

in  — tcc 


m 


(  2  log  log  m 


)  sup 

>  tel-T.Ti" 


IXnm(t)  -Xn(t)l  =  K 


a.s. 


v/ith  K  =  sup[  sup  |k(t)|  ;  IKp„l,  where  3Cp„  is  the  generalized  Finkel- 

tel-TTi" 

stein  set  corresponding  to  distribution  Un  which  has  the  characteristic  func¬ 
tion  Xn(t). 

For  testing  normality  with  an  arbitrary  covariance  operator,  the  gener¬ 
alized  Murota-Takeuchi  statistic  of  2-c-i  as  applied  to|-T,Tl’’  will  be  suitable. 
In  this  regard,  a  sequential  scheme  of  tests  of  normality  in  increasing  num¬ 
ber  of  dimensions  by  using  the  expansive  set  of  data  given  by  (1.1)  can  also 
considered.  The  acceptance  and  rejection  of  the  hyptothesis  of  normality  in 
the  sequence  space  should  be  based,  in  some  way,  on  the  length  of  runs  of 
acceptance  and  rejection  in  finite  dimensional  tests.  However  one  should 
expect  a  substantial  difficulty  in  introducing  measures  of  performance  for 
such  a  test. 
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I 

i  We  wish  to  look  to  using  probabilistic  methods  in  what  some  may  view 
as  the  "strange  new"  mathematics.  If  nature  uses  probability  in  quantum 
physics,  then  why  shouldn't  mathematicians?  The  possibilities  abound. 
Perhaps  "undecidable  propositions"  should  be  looked  at  as  statements  with 
probabilistic  truth  values,  perhaps  "NP<omplete"  problems  should  be  at¬ 
tacked  probabilistically,  and  perhaps  "chaos"  is  merely  the  picture  of  prob¬ 
ability  in  what  used  to  be  thought  of  as  "deterministic  reality." 

Or  perhaps  this  is  all  nutty,  but  let's  have  a  look. 


I.  Introduction 

The  dawn  of  Quantum  Theory  shook  the  scientific  world  with  the  stunning 
message  that  the  elementary  particles  were  really  not  stuff  at  all,  but  "pieces 
of  probability."  Believe  it  or  not,  understand  it  or  not,  this  description  of 
the  elementary  particles  as  having  only  a  probabilistic  reality  is  a  highly 
functional  and  useful  picture  of  the  subatomic  universe.  It  works! 

The  great  Albert  Einstein  never  "believed"  this  probability  picture  of 
matter,  but  that  didn't  stop  his  using  it  with  incredible  effectiveness.  Indeed, 
it  was  for  this  that  he  earned  his  Nobel  prize:  the  prize  was  given  him  for 
the  photoelectric  effect,  not  for  the  monumental  relativity  theory. 

So,  this  second  20th-century  revolution  in  physics  carried  the  message 
that  the  truth  in  science  was  more  "fuzzy"  than  had  been  thought.  The  first 
revolution — relativity  theory — merely  said  that  we  had  been  mistaken  in  our 
picture  of  truth,  but  did  not  deny  the  sharpness  of  it.  Truth  under  Einstein 
was  just  as  sharp  and  hard-edged  as  it  had  been  under  Newton.  But  truth 
under  Bohr  and  the  quantum  theory  gang  was  different,  fuzzy,  soft-edged — 
and  very  involved  with  probability  theory.  This  is  not  to  say  that  physicists 
were  unable  to  get  answers  to  problems.  A  whole  new  mechanics — quantum 
mechanics — evolved  and  answers  continued  to  be  cranked  out.  A  neiv  atti¬ 
tude,  but  business  as  usual. 
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This  new  attitude  seemed  to  be  everywhere.  The  great  writer  Vir¬ 
ginia  Woolf  made  an  important  discovery  about  one  of  her  characters,  a 
Mrs.  Brown.  "My  name  is  Brown,"  said  she,  "catch  me  if  you  can."  The  im¬ 
portant  discovery  of  Woolf  was  that  you  can'll  Mere  words  were  insufficient 
to  catch  anything  as  overwhelming  as  a  human  character.  But  Woolf's  ack¬ 
nowledgement  of  this  impossibility  was  a  tremendous  breakthrough!  She 
showed  us  the  infinitude  of  the  human  soul.  A  powerful  message! 

Even  the  pristine  Queen — Mathematics  herself — succumbed  to  new 
order.  Old  things  we  counted  on  gave  way  to  the  new  fuzzier  ones.  One 
old  thing  that  gave  way,  under  Godel,  was  decidability.  The  cherished  be¬ 
lief  had  always  been  that  a  meaningful  mathematical  statement  was  either 
provably  true  or  provably  false.  This  belief,  however  attractive  and  "con¬ 
vincing,"  simply  had  to  be  given  up  when  an  example  was  produced  which 
was  meaningful,  but  not  provably  true  and  not  provably  false!  When  this  ex¬ 
ample  was  translated  into  its  arithmetical  counterpart,  it  turned  out  to  be  so 
convoluted  and  endlessly  "boring"  that  mathematicians  simply  laughed  and 
scoffed.  "Oh,  who  cares  about  that  silly  isolated  counterexample,"  they  said. 

You'd  think  that  mathematicians  would  know  better!  You'd  think 
they  learned  their  lesson  from  Pythagoras.  The  irrationality  of  the  "silly 
number"  also  appeared  to  be  an  isolated  example.  Now  we  know  the 
irrationals  are  all  about  us;  they  are  in  every  sense  more  numerous  than  the 
rationals.  No,  no,  mathematicians  should  never  make  fun  of  the  isolated 
counterexample.  It  will — like  VI  did,  and  like  the  Godel  undecidable — 
emerge  as  the  oivnvhelniing  norm.  Yes,  it  is  now  known  that  moft  statements 
are  undecidable.  Our  cherished  provably  true  (or  provably  false)  statements 
are  the  silly  ones,  being  in  the  infinitesimal  minority. 

Godel's  example — aside  from  its  100  pages  of  very  ponderous  details 
which  show  that  that  it  is  a  meaningful  statement  (in  the  system  of  in  tegers) — 
is  reducible  to  a  Kindergarten  version  which  the  reader  might  enjoy.  He  also 
would  probably  be  horrified  by  it  because  it  appears  dangerously  close  to 
the  obviously  contradictory  and  senseless  childhood  joke,  "This  statement 
is  false."  But  this  close  resemblance  is  really  not  so  significant.  The  words 
"true"  or  "false"  are  not  really  definable  in  the  system,  whereas  the  word 
"provable,"  meaning  having  a  proof,  is  definable — as  a  moment's  thought 
(or  100  pages  of  genuine  mathematics)  will  convince  you.  At  any  rate,  our 
Kindergarten  Godel  statement  is. 

This  statement  is  unprovable. 


which  we  have  "framed"  for  reference's  sake.  We  will  call  it  the  "statement 
in  the  frame"  (SIF)  and  at  other  times  we  will  read  it  and  observe  what  it  says. 
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So,  now  for  the  undecidability  proof: 

STEP  1  Suppose  the  SIF  had  a  proof.  Then  it  would  be  true,  but  taking  it 
out  of  its  frame,  it  would  be  unprovable  (because  that's  what  it  says).  This 
is  a  contradiction!  So,  it  cannot  have  a  proof. 

STEP  2  We  now  know  from  step  1  that  the  SIF  cannot  have  a  proof,  so 
taking  it  out  of  its  frame,  we  see  that  it  is  making  a  true  statement.  So,  it 
is  true. 

STEP  3  Since  the  SIF  is  true,  by  step  2,  we  may  conclude  tha  t  the  SI  F  ca  n  not 
be  proven  false. 

Okay  then,  steps  1  through  3  do  the  job.  Step  1  shows  that  the  SIF 
cannot  be  proved  true  and  step  3  shows  that  the  SIF  cannot  be  proved  false. 
So,  indeed  the  SIF  is  an  undecidable  proposition! 

A  general  underlying  theme  seems  to  run  in  the  examples  thus  fa r:  that 
the  sharp,  hard-edged  notion  of  truth  isn't  adequate  any  longer.  Indeed,  it 
never  was.  Elementary  particles  existed  long  before  their  subtleties  were 
recorded  by  the  quantum  theorists,  Mrs.  Browns  ivere  indescribable  long 
before  Virginia  Woolf  pointed  it  out  to  us,  and  undecidable  propositions 
were  always  so. 

The  probability  aspect  of  this  "fuzz"  is  not  really  apparent,  except 
perhaps  in  quantum  theory,  where  even  Einstein  had  some  doubts.  But  let 
us  turn  now  to  other  examples  where  probability  is  more  obvious. 

There  is  much  current  interest  in  J.P.  Kahane's  construction  of  the  ultra¬ 
flat  polynomials.  He  answers  thereby  many  questions  of  Hardy,  Littlewood, 
and  Erdos  by  a  clever  use  of  probability  methods  in  Fourier  analysis.  At  any 
rate,  Kahane  produced  a  polynomial  of  each  degree  with  certain  remarkable 
properties;  the  word  "produced"  must  be  emphasized  because  his  methods 
involve  probabilistic,  and  therefore  not  explicit,  constructions.  To  some  of 
us,  this  nonexplicitness  is  quite  acceptable — and  in  fact  quite  beautiful.  It 
is  very  exciting  to  prove  the  existence  of  something  which  nobody  has  the 
vaguest  notion  of  how  to  locate! 

Thus,  a  real  number  exists  which  is  "normal"  in  every  base.  Expanded 
in  base  2,  it  has  asymptotically  as  many  0's  as  it  does  I's;  in  base  3,  as  many 
O's  as  it  does  I's  or  2's;  in  base  4,  as  many  O's  as  it  does  Ts,  2's,  or  i's,  and  so 
on.  Such  a  real  number  exists  because  almost  all  numbers  have  this  property 
(the  probability  argument!),  but  so  far  no  such  number  has  been  explicitly 
produced.  It  is  even  conceivable  that  no  such  explicit  number  can  ever  be 
produced.  Undoubtedly,  numbers  such  as  n  or  \?2  are  normal  in  every  base, 
but  it  may  well  be  undecidable  that  they  are  so. 

Probability  methods  are  in  fact  the  direct  successors  of  counting  argu¬ 
ments  and  the  nonconstructive  nature  of  these  are  often  quite  amusing.  For 
instance,  one  can  prove  that  there  are  two  people  in  New  York  City  with  the 
same  number  of  hairs  on  their  heads  (there  are  more  than  7,000,000  people 
there  and  are  less  than  200,000  hairs  on  any  head).  This  counting  argument  is 
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an  excellent  example  of  a  situation  where  explicit  construction  is  impossible. 
Even  the  "totally  bald"  heads  have  a  few  hundred  tiny  hairs  on  them  and 
the  sought-after  couple  is  a  time-varying  function,  hairs  constantly  grow¬ 
ing  in  and  falling  out.  Explicit  construction  is  impossible,  but  the  counting 
argument  is  absolutely  convincing! 

Erdos  loves  to  stand  up  in  a  crowded  room  and  boldly  announce: 
"There  are  two  people  here  who  have  exactly  the  same  number  of  friends  in 
this  room."  This  proof,  albeit  a  counting  argument,  has  a  slight  twist  to  it 
(a  slight  nontriviality,  of  course,  coming  from  Erdos),  but  again  it  is  not  at 
all  based  on  any  explicit  knowledge  of  the  people  involved.  We  knotv  much 
more  than  we  know! 

So,  sometimes  the  probability  argument  is  a  pure  delight  and  we 
couldn't  desire  any  more.  Who  cares,  after  all,  which  two  New  Yorkers 
have  the  same  number  of  hairs  on  their  heads?  But  there  are  times  when  we 
do  desire  more:  we  would  love  to  see  an  explicit  display  of  a  Kahane  ultraflat 
polynomial — and  at  least  one  explicit  example  of  a  normal  number. 

There  is,  however,  one  area  in  mathematics  where  the  probabilistic  is 
the  only  choice,  where  any  constructive  choice  is,  by  its  very  nature,  coun- 
terindicated.  This  is  in  game  theory,  where  much  of  the  battle  is  to  make 
unpredictable  moves! 

In  Wilson's  very  nice  book.  The  Selfish  Gene,  a  remarkable  connection 
is  disclosed  between  game  theory  and  evolution — another  demonstration  of 
how  our  very  existence  is  connected  to  probabilistic  mathematics.  Williams 
points  to  an  actual  species  of  rodents  which  are  competitive  for  their  food 
supply.  The  game  is:  to  fight  or  not  to  fight.  If  both  participants  elect  to 
not  fight  (i.e.,  they  are  pacifists),  then  the  food  is  shared.  If  one  decides 
to  fight  (i.e.,  is  a  warrior)  and  the  other  decides  to  not  fight,  then  the  food 
goes  entirely  to  the  warrior.  And  if  both  decide  to  fight,  then  both  lose  by 
being  hurt  in  their  battle.  Note  then  that  this  is  not  a  zero-sum  game,  but 
nonetheless  there  is  a  solution.  With  the  appropriate  numerical  parameters 
estimated,  this  solution  turns  out  to  be  roughly:  be  a  warrior  32%  of  the 
time,  and  be  a  pacifist  68%  of  the  time.  Remember,  we  are  describing  an 
actual  species  and  actual  parameters.  What  is  amazing  is  that  in  this  species 
it  is  observed  that  32%  of  the  members  are  warriors  and  68%  are  pacifists! 

No  individual  varies  his  play — the  species  as  a  whole  solved  the  game. 
Evolution  does  probability! 


2.  Probability  in  our  thought  processes 

Whether  or  not  we  are  conscious  of  it,  our  reasoning  is  at  least  partially 
subject  to  randomness.  The  longer  and  more  complicated  a  proof,  the  less 
we  really  do  believe  it.  Intuitively,  we  don't  fully  (i.e.,  with  probability  p  =  1) 
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believe  any  single  step,  so  that  a  combination  of,  e.g.,  1 00  steps  has  only  a 
probability  p’*’® — ^and  this  might  be  quite  far  om  1. 

My  favorite  example  of  this  built-in  probability-in-reasoning  is  the 
famous  short  story  of  Robert  Louis  Stevenson's,  "The  Imp  in  the  Bottle".  Ac¬ 
cording  to  the  story,  there  is  an  Imp,  a  magical  wish-granting  genie,  trapped 
inside  a  bottle.  If  you  buy  this  bottle,  you  own  the  Imp  and  it  will  grant  you 
wealth,  love,  and  power.  The  catch  is  that  you  must  sell  it  before  you  die  for 
strictly  less  than  you  bought  it  for,  or  else  you  are  doomed  to  roast  in  hell  for 
all  eternity. 

To  make  this  precise,  let  us  mandate  that  the  purchase  price  of  the  Imp 
must  be  a  positive  integer  number  of  cents,  American  funds. 

SO:  Would  you  buy  it  for  No,  of  course  not.  You  could  not  ever  sell  it 
for  less  than  you  bought  it  for;  you  would  surely  lose! 

NEXT:  Would  you  buy  it  for  2^  (Now  the  thought  process  is  slower.)  But 
again  the  answer  is  "No."  By  the  previous  argument  you  know  that  nobody 
would  ever  buy  it  from  you  for  1^.  So,  indeed  you  still  could  never  sell  it 
before  you  die. 

The  mathematician  sees  the  obvious  induction  in  the  above  and  (sort 
oO  agrees  that  therefore  he  would  never  buy  the  Imp  for  any  ruf. 

BUT:  1  would  surely  buy  this  Imp  for  $1,000.00 — and  feel  totally  sure  of 
selling  it  for,  say,  $999.00. 

Nobody  really  believes  a  proof  of  100,000  steps!  (And  rightfully  so,  if 
nobody  believes  a  proof  of  99,900  steps!) 

(In  the  actual  Stevenson  tale,  the  protagonist  buys  the  bottle  for  (the 
equivalent  of)  2i  and  is  able  to  sell  it  to  a  drunken  sailor  for  !«'.  The  sailor 
is  delighted,  since  he  knows  that  he  is  already  damned  to  spend  eternity 
in  hell!) 

There  is  another  example  of  lack  of  belief  in  a  long  proof.  This  is  the 
infamous  four-color  problem,  which  has  been  solved  by  a  computer.  A 
computer  proof  (?)  based  on  firings — and  perhaps  misfirings — of  electrons? 
Why  should  I  believe  such  a  proof,  especially  this  proof  which  has  had  to  be 
"repaired"  several  times  when  errors  were  found? 

These  are  all  valid  objections,  but  then  again  are  firings  and  misfirings 
of  electrons  in  a  machine  any  less  convincing  than  firings  or  misfirincs  of 
neurons  in  a  human  brain?  Are  any  long  proofs  by  a  computer  or  from  a 
person  totally  believable?  Or  are  they  all  just  probable? 

Are  there  any  mathematical  truths,  or  only  things  of  probability  1  - 
)0~'o°?  Perhaps  anything  with  probability  $  I  -  10“'®®  is  as  true  as  you 
can  ever  get. 

So,  truth  is  like  elementary  particles.  It  isn't  stuff  at  all,  but  only  proba¬ 
bility.  Yes,  Mr.  Einstein,  not  only  does  God  play  dice,  God  is  dice'. 
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3.  P  =  NP? 

We  hinted  at  the  "mentality"  of  the  computer  in  connection  with  the  four- 
color  problem,  but  major  questions  remain;  Does  a  computer  have  mental¬ 
ity?  Does  it  have  consciousness?  Does  it  have  strokes  of  genius?  This  last 
question  has  been  formalized  into  the  well-known  P  =  NP  problem.  In  sim¬ 
plified  nontechnical  terms,  this  problem  exemplifies  the  familiar  situation  in 
which  we  say  to  ourselves,  "Oh,  of  course;  why  didn't  I  think  of  that?"  It  is 
the  situation  where  an  answer  to  a  question  is  so  easy  to  check  that  we  feel 
that  it  should  have  been  easy  to  find. 

(Is  4294967297  composite?  Yes,  it's  divisible  by  641.  Oh,  why  didn't  I 
see  that?  I  don't  know  Mr.  Fermat  you  should  have.) 

At  any  rate,  this  is  the  P  =  NP  problem.  Can  all  easily  checked  answers 
to  a  question  be  easily  discovered? 

The  thought  process  we  want  is  a  little  like:  "Why  don't  1  try  . . .  Oh, 
of  course,  it  works!"  In  short,  if  we  guarantee  the  "Oh,  of  course,  it  works," 
then  all  we  need  supply  is  the  "Why  don't  I  try. . ."  but  that's  the  genius  part; 
that's  the  flash  of  the  idea.  There  seems  little  reason  to  believe  that  the  hunch 
presents  itself  to  us  just  because  it  will  be  easy  to  check — except  that  we  have 
all  seen  it  happen  and  too  often  to  consider  it  an  accident.  It  is  with  great 
delight  that  we  witness  strokes  of  genius — even  when  (as  is  usual)  they  are 
from  others. 

■  When  a  12-year-old  boy  makes  a  queen  sacrifice  and  wins  a  champi¬ 
onship  chess  game,  all  the  world  rejoices  (well,  except  for  the  one  who 
lost  to  him)! 

•  When  Sue  Shapiro  (a  friend  of  the  family)  unerringly  picks  out  the 
four-leaf  clover  from  a  field  of  clover,  the  witnesses  are  both  stunned 
and  delighted. 

■  When  that  little  boy  on  television  solves  all  those  complicated  mazes 
with  no  apparent  effort,  we  are  again  quite  delighted. 

These  are  perfect  examples  of  solutions  which  are  easy  to  check.  Once 
made,  Bobby  Fischer's  brilliancy  of  the  century  is  quickly  seen  to  force  a 
checkmate.  Once  drawn,  a  successful  maze  path  is  also  quickly  seen  to 
succeed.  A  four-leaf  clover  is  a  palpable  reality.  There  are,  of  course,  many 
other  examples.  There  seem  to  be  very  special  people  A^ith  very  special 
talents,  but  what  is  being  suggested  is  that  yes,  P  =  NP — that  any  question 
which  has  an  easily  checked  solution  also  has  an  easy  path  to  that  solution 
(the  discovery  process  being  available  to  a  special-purpose  mind). 

And  now  comes  the  fuzziness.  We  seem  to  be  on  the  brink  of  another 
undecidability.  If  indeed  P  =  NP,  we  undoubtedly  could  never  prove  it.  For 
a  proof  would  involve  a  production  of  an  algorithm  (the  proof  itselO,  and 
such  an  algorithm  could  be  programmed  into  a  computer.  The  computer 
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could  have  strokes  of  genius!  The  great  boon  to  mankind!  Of  course,  this  is 
the  reason  for  the  great  interest  in  the  P  =  NP  problem  in  the  first  place. 

Well,  why  not?  It's  about  time  that  something  good  happened  to 
mankind.  Why  not  have  the  computer  become  a  true  thinker?  ('tis  a  con¬ 
summation  devoutly  to  be  wished) 

The  negative  answer,  however,  is  very  convincingly  put  forth  in  Pen¬ 
rose's  marvelous  book.  The  Emperor's  New  Mind.  Penrose's  thesis  is  simply 
that  computers  are  not — and  never  can  become — conscious,  and  that  there¬ 
fore  they  can  never  become  true  thinkers. 

We  seem  to  be  led  to  the  conclusion  that  P  =  NP  is  undecidable.  It  is, 
as  with  the  SIF  of  Gbdel,  true  (because  of  the  Bobby  Fischers),  but  improvable 
(because  of  Penrose's  thesis). 

4.  Chaos:  The  third  20th  century  revolution  in  science 

The  message  of  this  paper  and  of  this  "new  science"  is  that  we  are  living  not 
in  a  universe  of  sharp,  clear  realities,  but  rather  in  one  of  probabilities  (what 
we  have  been  calling  fuzzy  instead  of  what  we  have  been  calling  sharp).  The 
germ  of  this  new  attitude  came  from  the  discovery  of  huge  discontinuities 
in  nature:  the  butterfly  effect,  in  popular  terminology.  The  scenario  is  that 
of  a  butterfly  flapping  its  wings  in  Tokyo  causing  a  slight  dislocation  of 
air  particles,  thereby  causing  a  slight  motion  of  leaves  on  a  tree,  and  so  on 
and  on,  . . .,  ending  in  a  cyclone  somewhere.  The  point  is  that  such  a  slight 
perturbation  could — and  often  does — cause  large  changes  at  large  distances. 
The  mathematical  notion  of  discontinuity  is  certainly  not  a  new  one.  What 
is  new  is  the  realization  that  discontinuity  is  prevalent  in  nature. 

If  indeed  the  flutter  of  a  butterfly's  wing  in  Tokyo  could  cause  a  cyclone 
in  Timbuktu,  then  we  must  conclude  that  that  cyclone's  cause  was  probability. 
It  was  not  a  large  physical  force  that  caused  the  butterfly  to  wave  its  wings, 
but  only  a  random  (probability!)  whim.  The  butterfly  effect  was  possibly 
first  discovered  by  Edward  Lorentz,  who  was  doing  some  computer  experi¬ 
ments  in  meteorology.  Lorentz  would  feed  into  a  computer  some  descriptive 
weather  parameters  and  then  read  out  some  weather  predictions.  Quite  by 
accident,  he  fed  in  some  parameters  which  differed  only  very  slightly  from 
the  ones  he  had  used  the  week  before,  and  was  shocked  to  find  that  the 
new  week's  predictions  were  markedly  different  from  the  earlier  ones.  "I 
only  changed  the  parameters  in  the  fourth  decimal  place;  no  instruments 
ever  detect  better  than  that."  If  such  "unnoticeable"  errors  produce  such 
an  enormous  cumulative  effect,  then  we  have  been  all  wrong  looking  for  a 
sensible  method  of  weather  prediction!  It  doesn't  even  exist  in  nature! 

Lorentz  went  on  to  build  his  famous  mathematical  model,  which 
showed  the  truly  explosive  property  of  the  iterative  procedures  present  in 
nature.  Lorentz's  finding  was  not  restricted  just  to  meteorology.  Similar 
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phenomena  were  discovered  in,  e.g.,  the  gypsy  moth  population,  shapes 
of  clouds,  the  intertwining  of  blood  vessels,  heart  troubles — even  the  stock 
market  variations.  The  word  spread  like  wildfire!  All  is  discontinuity,  and 
hence  all  is  probability. 
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i  Positive-definite  functions  or  distributions  apjjear  naturally  in  the  theory 
of  homogeneous  random  fields  and,  in  particular,  in  the  definition  of  their 
correlation  functionals.  We  consider  the  problem  of  extending  a  radial 
positive-definite  function,  or  distribution,  defined  in  a  ball  centered  at  the 
origin  of  91’'  to  one  defined  in  the  whole  space.  Such  an  extension  was 
shown  to  exist  by  W.  Rudin  (in  the  case  of  a  continuous  function)  and, 
later  on,  by  A.E.  Nussbaum  (in  the  case  of  a  distribution).  CXir  goal  in  this 
paper  is  to  explain  how  the  maximum  entropy  principle  can  be  used  to 
obtain  explicit  solutions  of  the  n-dimensional  radial  extension  problem  via 
a  reduction  to  a  one-dimensional  one.  We  also  investigate  the  question  of 
uniqueness  of  the  extension. 


1.  Notation 

We  will  denote  by  DP'  the  n-dimensional  Euclidean  space  and  by  its  dual 
group.  If  f  is  a  function  defined  on  we  will  denote  by  f  the  function 
defined  by  f  (x)  =  f(-x),  for  x  in  91".  If  f  :  91  -i  C  is  a  function,  we  denote  its 
Fourier  transform  by  f  or  7i f .  It  is  defined  by 

f(Y)  =  (7if)(y)  =  f  e-^^^^^ffxldx,  Y  6 

If  f  :  91"  — »  ff,  its  Fourier  transform  Jnf :  91"  — >  C  is  defined  by 


e-2"i<^"y>f(y)dy, 


£.€  IH". 
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Note  that  we  reserve  the  notation  f  for  functions  f  defined  on  tH.  If  g  :  SH"  — > 
C,  its  inverse  Fourier  transform  is 


O'n  'g)(x) 


d£„ 

91" 


xeiR". 


Of  course,  as  usual,  the  Fourier  transform  is  first  defined  for  functions 
in  the  Schwartz  class  S(91")  and  then,  by  duality,  to  the  class  of 

tempered  distributions.  We  will  denote  by  the  bracket  (T,  4))  the  duality 
between  a  distribution  (or  a  generalized  random  field)  T  and  a  test  function 
4).  CJf  (D)  is  the  class  of  infinitely  differentiable  functions  with  compact 
support  in  O.  Bn(R)  and  are  defined  by  Bn(R)  =  {x  G  SHJxl  <  R] 
and  Sn  =  {i,  €  =  1).  We  also  denote  by  da  the  (n  -  l)-dimensional 

Lebesgue  measure  on  S„  and,  if  4)  is  a  function  on  1)1'',  we  define  its  spherical 
average  4>°  by  the  formula 


4)°(x)  = 


ISnl 


4)(|x|a)  da, 
s„ 


XGIH". 


2.  Introduction 

Let  us  consider  a  complex-valued  generalized  random  field  O  defined  on  Dl". 
If  we  pick  any  4)i , . . .  .(fn  €  C”{91''),  then  (0, 4)i (0, 4>n)  are  random 
variables  with  a  well-defined  joint  probability  distribution.  We  will  assume 
in  the  following  that,  for  all  4>  €  C^(9t''),  the  expected  value  E(  (0, 4>) )  =  0 
and  also  that  E(  |(<I),4)}i^  )  <  oo.  The  generalized  random  field  is  called 

homogeneous  if,  for  any  4)\ . 4)i,  €  C.ll'l':)!'')  and  any  point  h  t  IK", 

the  k-dimensional  random  variables  ( {0.4>i) . 4>k)  )  and  ( (0, 4)i  (■  -i- 

h)) . {cl),4)k(  +  h))  1  are  identically  distributed.  This  is,  of  course,  the 

n-dimensional  analogue  of  a  generalized  stationary  stochastic  process  (see 
[9,  16,  17]  for  more  details).  The  correlation  functional  of  the  generalized 
random  variable  <D  is  the  sesquilinear  form  B  defined  on  0^(91" )  v  (91" ) 
by  the  formula 

V4>i.02  €  c;r(91"),  B(4)i.4>2)  =  E(((P.4>i)(0:^).  (2.1) 

In  the  case  where  <I>  is  homogeneous,  the  sesquilinear  form  B  is  translation 
invariant  and  one  can  show  the  existence  of  a  unique  distribution  (in  the 
sense  of  Schwartz)  Q  e  S' (91" )  such  that 

V4>i.4)2  €  Cjf(91"),B(4)i.4»2)  =  (Q.4>i  *^).  (2.2) 

The  distribution  Q  is  clearly  positive-definite  on  91",i.e.,  for  all  4)  G  C^(91" ), 
we  have  (Q,  cj)  ♦  4’)  ?  0  since 

(Q,4)*0)  =  E(|((D,4))1^!  5  0. 
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Conversely,  it  is  known  that  every  positive-definite  distribution  Q  e 
arises  in  the  definition  of  the  correlation  functional  of  some  generalized 
homogeneous  random  field  O.  Thus,  Q  controls  the  "second-order  theory" 
of  O  and,  in  fact,  if  <!>  is  Gaussian,  it  is  completely  determined  by  Q.  By 
the  Bochner-Schwartz  theorem  [15],  the  Fourier  transform  of  Q  is  a  positive 
tempered  measure  u  on  i.e.,  u  ^  0  and  for  some  integer  m  ^  0,  we  have 

(1+|£.|^)""'  du{£.)<oo. 

This  yields  the  following  representation  for  the  correlation  functional; 


V4)i,4)2  e  C^(fH''),B(<l),,c|)2)=  (J„4),)(i)(3-n4>2)(^.)dp(£.). 

The  measure  u  is  called  the  spectral  measure  of  the  generalized  random  field 
cb.  In  the  case  where  the  positive-definite  distribution  Q  associated  with  <I) 
is  radial,  i.e.,  if 

V<|)€  (Q,(t))  =  (Q.4>°>. 

where  (I)®  is  the  spherical  average  of  (|),  we  say  that  O  is  homogeneous  and 
isotropic. 

In  the  following,  we  will  be  dealing  with  a  basic  extension  problem:  we 
will  assume  that  the  values  of  the  correlation  functional  of  a  homogeneous 
and  isotropic  random  field,  B((t)i ,(t)2).  are  only  known  to  us  when  <l)i,‘t>2 
are  supported  in  some  finite  ball  centered  at  the  origin,  and  we  will  try  to 
construct  explicitely  spectral  measures  consistent  with  the  given  correlation 
data.  We  will  also  look  at  the  question  as  to  when  a  spectral  measure 
consistent  with  the  given  correlation  data  is  unique  and  explain  the  relation 
with  the  concept  of  maximum  entropy. 

3.  Extension  of  distribution  positive-definite  in  a  ball 

We  need  the  following  definition. 

Definition  3.1.  Let  R  be  such  that  0  <  R  $  oo  and  suppose  that  Q  is  a 
distribution  (in  the  sense  of  Schwartz)  on  B„(R).  We  say  that  Q  is  positive- 
definite  on  Bn (R)  if 

V<{)€  C*(Bn(R/2)).  (3.1) 

In  that  case,  we  will  write  Q  >  0  on  B„(R). 

Let  us  remark  that  if  Q  is  actually  a  continuous  function  on  B,,  (R)  (i.e., 
O  is  a  "standard"  homogeneous  random  field),  (3.1 )  is  equivalent  to 

y~  Q(X\  —  Xj )  £,i  £,j  ^0 
i.i 
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for  all  X) , . . .  ,Xk  6  Bn(R/2),  all  £,i, . .  .,£,k  e  C,  and  all  k  ^  1. 

The  basic  extension  problem  stated  above  can  be  rephrased  in  terms 
of  distributions  in  the  following  way:  given  Q  »  0  on  Bn(R),  vve  try  to 
construct  Qi  »  0 on  such  that  Qi  =  Q  on  Bn(R),  or,  equivalently,  we  try 
to  find  a  positive  measure  q  €  such  that  '(h)  =  Q  on  Bn(R).  In 

the  case  n  =  1,  M.G.  Krein  [10]  showed  that  such  an  extension  was  always 
possible.  However,  when  n  >  I  and  Bn(R)  is  replaced  by  an  n-dimensional 
cube  centered  at  the  origin,  the  extension  problem  does  not  always  have 
a  solution,  as  was  shown  by  W.  Rudin  [13].  Nevertheless,  it  was  shown, 
again  by  W.  Rudin  [14],  that  if  Q  >  Oon  B„(R)  is  a  radial  continuous  func¬ 
tion,  the  extension  was  always  possible.  Rudin's  work  was  later  generalized 
by  A.E.  Nussbaum  [12]  to  include  the  case  of  radial  positive-definite  dis¬ 
tributions.  As  far  as  the  uniqueness  problem  is  concerned,  necessary  and 
sufficient  conditions  were  given  in  the  one-dimensional  case  by  M.G.  Krein 
[10]  and  E.J.  Akutowicz  ([!];  see  also  [2])  for  continuous  functions,  but  they 
are  not  easy  to  check  in  practice.  Recently,  the  author  [8]  found  another  nec¬ 
essary  and  sufficient  condition  for  nonuniqueness  in  the  one-dimensional 
case,  given  in  terms  of  the  continuity  of  a  linear  functional,  which  is  valid 
for  distributions. 

Theorem  3.2.  Let  T  >  0  on  (-R,  R).  Then  the  extension  problem  for  T  has  a 
nonunique  solution  if  and  only  if,  for  some  A  €  C,  with  Im  A  0,  there  exists 
C  >  0  such  that 

V(pG  Cr((0.R)).  l<p(A)KC[(T,<p*(p)]'^^  (3.2) 

It  can  be  shown  that  if  (3.2)  holds  for  some  A  with  Im  A  7^  0,  then,  in  fact, 
it  holds  for  all  A  €  (T  (with  the  constant  C  dependent  on  A).  Using  this  fact, 
one  can  extend  the  definition  of  the  Fourier  transform  to  the  completion  of 
C5^((0,  R))  with  respect  to  the  norm 

li<Pll  =  v/[(T,<p  ♦  ip)), 

defined  for  all  <p  €  C^((0,  R)).  If  we  denote  by  H  that  completion,  then  the 
Fourier  transform  of  an  element  u  €  H,  denoted  by  u,  is  an  entire  analytic 
function  of  exponential  type  less  than  or  equal  to  27:R.  Of  course,  H  is  a 
Hilbert  space  with  inner  product  defined  for  the  elements  e  C^((0,R)) 
by  [  <p,  1  =  (T,  (p  ♦  i];)  and  extended  by  continuity  to  all  of  H.  It  is  immediate 

that  if  S  >  0  on  and  S  =  T  on  ( -R,  R),  then 

[(p,T[>l=  (P(y)U)(y)  du(Y), 

for  all  <p,v|^  €  €”((0,  R))  where  n  =  S  and  this  integral  representation  of  the 
inner  product  in  H  extends  immediately  to  all  of  H.  It  turns  out  that  the 
uniqueness  problem  is  closely  related  to  the  notion  of  entropy. 
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Definition  3.3.  Let  ^  ^  0  be  a  tempered  measure  on  91  with  =  w  +  Hs, 
where  w  €  Lj^cl^land  Hs  is  singular.  Then  li.  is  said  to  have  finite  entropy  if 


^  < 


logw(y) 
1  +7^ 


dy  >  -oo. 


One  can  show  (see  [8])  that  if  n  is  a  measure  with  finite  entropy  and 
=  T  on  (-R,R),  then  the  extension  problem  for  T  always  has  more 
than  one  solution.  Conversely,  if  the  extension  is  nonunique,  there  exist 
positive-definite  extensions  whose  Fourier  transforms  are  measures  with  fi¬ 
nite  entropy  and,  in  fact,  the  entropy  maximizers  corresponding  to  certain 
logarithmic  integrals  depending  on  the  complex  parameter  A  can  be  com¬ 
puted  explicitly.  More  precisely,  if  A  e  €  and  Im  A  >  0,  let  us  denote  by 
the  unique  element  of  H  satisfying 

Vcpe  C“((0,R)).  (p(A)  =  [<p.ua].  (3.3) 


It  can  be  shown  that  ua  does  not  vanish  on  the  real  axis  and  thus  one  can 
define  the  weight  va  by  the  formula 


Vy  €  91,  VA(y)  = 


1mA  IIuaII^ 
7rluA(y)P|A-yl^' 


(3.4) 


We  have  the  following  theorem. 


Theorem  3.4.  Let  T  »  0  on  (-R,R)  satisty  (3.2)  and,  if  1mA  >  0,  let  va 
be  the  weight  defined  by  (3.3)  and  (3.4).  Then  va  €  S'(iH),  Jf’vA  =  T  on 
(— R,  R),  and,  furthermore,  if  n  ^  0  is  any  measure  in  S'('H)  with  absolutely 
continuous  part  w  and  V  =  Ton  (-R,  R),  we  have  the  entropy  inequality 

ImA  f  logw(y)  .  ImA  f  logVA(y)  , 
with  equality  if  and  only  if  n  =  va. 

This  theorem  was  proved  in  [8].  A  version  of  it  was  proved  inde¬ 
pendently  by  H.  Dym  (see  [7]),  but  in  the  case  of  matrix-valued  functions 
and  with  stronger  assumptions  on  T.  Let  us  mention  here  that  the  concept  of 
maximum  entropy  in  connection  with  the  extension  problem  was  introduced 
by  J.P.  Burg  ([5];  see  also  [3, 11])  in  the  discrete  case  and  by  J.  Chover  [6]  in 
the  continuous  one.  Another  interesting  version  of  the  entropy  inequality 
stated  above  can  be  found  in  [4]. 
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4.  Higher  dimension:  the  radiai  case 

As  mentioned  earlier,  the  n-dimensional  extension  problem  was  solved  in 
the  radial  case  by  W.  Rudin  [14]  for  continuous  positive-definite  functions 
on  Bn(R)  and  by  A.E.  Nussbaum  [12]  for  positive-definite  distributions  on 
B„(R).  However,  the  existence  of  the  extension  is  obtained  by  Rudin  via 
a  Hahn-Banach  argument  and  by  Nussbaum  via  an  abstract  spectral  theo¬ 
rem  in  nuclear  spaces.  This,  of  course,  makes  it  difficult  to  obtain  explicit 
formulas  for  possible  extensions.  Since  such  formulas  are  available  in  the 
one-dimensional  case  (by  the  maximum  entropy  method,  for  example),  it  is 
of  practical  interest  to  try  to  reduce  the  study  of  the  n-dimensional  radial 
case,  which  is  essentially  "one-dimensional,"  to  the  one-dimensional  one.  It 
turns  out  that  such  a  reduction  is  possible  and  an  important  ingredient  for 
doing  so  is  the  following  lemma  used  by  W.  Rudin  in  [14]. 

Lemma  4.1  (W.  Rudin).  Let  4>  €  C^(Bn(R))  with  ([)  radial  and  suppose 

that  Jn4>  5  0.  Then  there  exist  a  sequence  {4)^}  with  4)^  €  C^(B„(R/2)) 
such  that 

4>  =  ^4)k*^  (4.1) 

k 

in  the  sense  of  convergence  in  C^7(Bn(R))- 

Remark  4.2.  The  cj^k's  in  the  previous  lemma  are  not  radial  in  general. 
Lemma  4.1  is  the  most  technical  part  of  Rudin's  proof  of  the  existence  of  the 
extension  in  the  radial  case  and  its  proof  uses  the  Hadamard  factorization 
theorem.  Rudin  only  mentions  the  uniform  convergence  of  the  series  in  (4 , 1 ) 
(since  this  is  all  he  needs),  but  the  convergence  in  C2^(Bn(R))  follows  easily 
from  his  argument. 

Rudin's  lemma  will  allow  us  to  associate  with  any  radial  distribution 
Q  >  0  on  Bn(R)  an  even  one-dimensional  distribution  T  >  0  on  (-R,  R). 

Lemma  4.3.  Let  Q  »  0  on  B„(R)  be  radial.  Then  the  distribution  T  defined 
on  (  — R,  R)  by  the  formula 

V^pt  C;f((-R,R)),  (T,cp)  =  (Q.4C(p).  (4.2) 

where 

(p(ILI)  +  <p(-l^l) 

2  ’ 

is  even  and  positive-definite  on  ( -R,  R]. 

Proof.  Since,  by  the  Paley- Wiener  theorem  (see  [15]),  <p  is  the  restriction  to 
IR  of  an  entire  function  of  exponential  type  less  than  ZnR,  it  follows  easily 


3C(p  =  j-' 
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that  the  function  \1;(£,)  =  [(p(|i,l  +  <p(-|£.|)l/2for  £,  €  is  the  restriction 
to  fR"  of  an  entire  function  of  exponential  type  less  than  ZttR  defined  on 
C".  Hence,  by  the  Paley-Wiener  theorem  again,  3C<p  belongs  to  C^TlBnlR)) 
and  (4.2)  is  well  defined.  It  is  also  easily  checked  that  the  mapping  X  : 
CS^((-R,R))  -»  C“(Bn(R))  is  continuous.  Thus  (4.2)  defines  a  distribution 
T  on  (-R,R)  which  is  clearly  even.  Let  us  show  that  T  >  0  on  (-R.R).  If 
(p  €  C^((0,R)),we  have 

TnDC((p  *  (p)  =  [|(p|^(|£.l)  +  |(pl^(-|LI)]  /2  ^  0 

and  thus,  by  Lemma  4.1,  there  exists  a  sequence  {(t)k!  in  C*(Bn(R/2))  such 
that  DC((p  *cp)  =  *4>k'n  the  sense  of  convergence  of  test  functions  in 

C^(Bn(R)).  Therefore, 

(T,  cp  ♦  (p)  =  (Q,3C((p  *  <p))  =  ^(Q,4)k  *  4>k)  5  0, 

k 

which  proves  the  lemma.  I 

Our  next  goal  is  to  solve  the  extension  problem  for  Q,  taking  for  granted 
that  we  can  do  it  for  T.  We  need  to  introduce  the  following  definition. 

Definition  4.4.  In  the  following,  we  will  denote  by  O  a  measurable  subset 
of  S„  having  the  property  that 

a  G  O  if  and  only  if -a  ^  Q  (4..?) 

for  a.e.(da)  a  €  S„.  We  define  the  function  sign^j  by  sign^j(O)  =  0  and,  if 
£.  ^  0,  sign^jlL)  =  I  for  L/ILI  €  Q,  and  sigOj^lL)  =  -1  for  L/jLI  i  O. 


Theorem  4.5.  Let  R  >  0,  let  Q  >  0  on  Br,(R)  be  radial  and  consider  the 
distribution  T  »  0  on  (-R,R)  defined  by  (4.2).  Suppose  that  the  positive 
measure  v  €  S'(tH)  satisfies  =  T  on  (-R,R).  Then,  the  measure 

p  €  S' (51")  defined  by  the  formula 


Vche  C*(51").(p,4>) 


2  IS..  I 


-I 


(hirer)  da 


dv'(T) 


satisfies  V  =  Q  on  B^IR). 


(4.4) 


Proof.  We  first  remark  that  if  (h  €  Ci;^(B..(R)),  if  a  €  S.„  and  if  we  denote 
by  dS  the  (n  -  l)-dimensional  Lebesgue  measure  on  the  hyperplane  (u  t 
fR",  (a,  u)  =  01,  the  function  cha  defined  by 


VteiR.ehdt)  = 


(|)(at  +  u)  dS(u). 


(a.ii)  0 
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belongs  to  C“((-R,R)).  Furthermore,  it  follows  easily  from  Fubini's  theo¬ 
rem  that  <t>a(T)  =  (to),  for  all  t  €  IH,  and  it  is  clear  that  (iJo  =  4’-a- 

Hence,  we  have 

I  (3-n4>)  (i.)  du(f,)  =  2  ISnP’  O-n*!))  (to)  dcT  dv(r) 

=  21Snr’|  1^1  4>o(r)dv(r)  do 

=  2|Snr'f  {T.<j)„)da 
Jn 

=  |Snl"'  (T,<l)o -f  4>ff)  da  (since  T  is  even) 
Jn 

=  lSnr'f  {T,<t)„  +  (t)_c)  da 
Jn 

=  |Snr'f  (T.4)„)da 
Js„ 

=  |Snl"'|  4)„(T)daj  dv(r). 

Now,  it  is  clear  that  the  function  4)  defined  by 

\p  =  3'f’  ISnl"’  (t)cr(r)da 
Js„ 

belongs  to  C“((-R,  R))  and  is  even.  Using  the  previous  computation,  we 
have  thus 

{n,3-n4))  =  {T,M))  =  (Q.^i>(IU)> 

=  (Q,J-'  [(Jn4)n) 

=  (Q.r)  =  (Q.4>). 

since  the  Fourier  transform  commutes  with  orthogonal  transformations  and 
Q  is  radial.  This  shows  that  =  Qon  Bn(R).  | 

Remark  4.6.  If  dv  =  v(t  )  dr  where  v  is  a  positive  function,  then  d  ^  =  w(  i.)  d£. 
where  w(i,)  =  2|Snl"'v(|£,|signj-3(£,))l£,r<''~". 

It  should  be  pointed  out  that  the  measure  n  constructed  above  is  not 
radial  in  general,  unless  v  is  even.  In  that  case,  n  can  be  defined  by  the 
formula 

V<^)eC^(«"),(^,cl))=|^[lS„r'|^  4>(ra)da  dv(r).  (4.5) 
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Conversely,  if  n  is  radial,  it  can  be  written  in  the  form  (4.5)  where  v  is  defined 
by  the  formula 


V(p  e  (v.cp)  = 


duly. 


Therefore,  by  Theorem  4.5,  there  is  a  one-to-one  correspondence  between 
radial  positive-definite  extensions  of  Q  and  even  positive  extensions  of  T 
One  obtains  in  this  way,  using  Theorem  3.2,  the  following  characterization 
for  the  "nonuniqueness"  of  the  radial  positive-definite  extensions  of  Q. 

Corollary  to  4.6.  Let  R  >  0  and  let  Q  >  0  on  Bn(R)  be  radial.  Then,  there 
exists  a  nonunique  radial  distribution  Qi  >0on9^''  withQ)  =QonBn(R) 
if  and  only  if  the  associated  one-dimensional  distribution  T  defined  by  (4.2) 
satisfies  (3.2). 


5.  Maximum  entropy 

It  is  now  clear  that,  when  the  n-dimensional  radial  extension  problem  has 
a  nonunique  solution,  we  can  use  the  maximum  entropy  method  (i.e.,  The¬ 
orem  3.4)  to  provide  us  with  explicit  solutions  by  using  Theorem  4.5.  More 
explicitely,  if  is  the  weight  defined  in  (3.4),  we  know  that  =  T  on 

(-R,  R)  and  therefore  the  weight  defined  by 

VL€  IH".  w^'(L)  =2|Snr'vA(l£.|sign,^(£.))|Lr"’-" 

satisfies  Jn'  =  Qon Bn(Rl forallchoicesof Qsatisfying (4.3).  Ofcourse.one 
can  wonder,  in  view  of  Theorem  3.4,  if  the  weights  are  the  entropy  max¬ 
imizers  associated  with  certain  n-dimensional  logarithmic  integrals.  This 
turns  out  to  be  the  case,  but  in  order  to  obtain  a  result  valid  in  full  generality, 
one  has  to  restrict  the  complex  parameter  A  to  vary  on  the  positive  imagi¬ 
nary  axis.  Furthermore,  in  that  case  the  associated  weight  turns  out  to  be 
radial  (and  thus  independent  of  Q)  and  the  logarithmic  integral  considered 
has  a  much  nicer  form.  We  need  the  following  lemma. 

Lemma  5.1.  Let  R  >  0  and  let  us  assume  that  the  distribution  T  >  0  on 
(  — R,R)  associated  with  Q  satisfies  (3.2).  Let  t  >  0  and  consider  Ud,  the 
unique  element  of  H  satisfying  (3.3)  with  A  =  it.  Then,  we  have 

Vy  e  (B.  |Uit(-y)l  =  |Uit(Y)|  (5.1) 

and,  in  particular,  the  weight  W,  defined  by 

V£.e  iW".  W,(y  =  2|SJ-'vii(ILI)ILr''’-’' 

satisfies  (W|)  =  Q  on  Bn(R). 
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Proof.  Since  T  3>  0  on  (— R,  R)  and  T  is  even,  T  must  be  real.  In  particular, 
this  implies  that  if  u  €  H,  then  u  €  H  and  ||u||  =  |(u||.  Now  by  definition  of 
Uit,  we  have  that,  for  all  (p  €  ((0,  R)), 


fui,,(p]  =  (T,  (uit)^ 


«P)  =  [ 


(p(x)  dx 


=  e--2’'’‘(p{x)dx  =  (T,  (uii)^  »  <p) 

=  (T,  (uiT)^  *  <p)  =  (ui7,  <pl. 

Hence,  uT7  =  Uit  by  uniqueness,  and  thus,  uu(-y)  =  Uiify),  which 
proves  (5.1).  This  clearly  implies  that  vjt  defined  by  (3.4)  is  even  and  thus 
=  Q  on  Bn(R)  by  Theorem 4.5.  | 

We  can  now  state  a  maximum  entropy  theorem  for  extensions  of  radial 
positive-definite  distributions. 

Theorem  5.2.  Consider  a  radial  distribution  Q  »  Oon  B„(R),  0  <  R  <  oo 
and  suppose  that  the  associated  distribution  T  satisfies  (3.2).  Let  p  be  a 
positive  tempered  measure  on  (B"  with  p  =  w  +  ps.  where  vv  €  L’„^,  (fH") 
and  Ps  is  singular.  Then,  if  T“'(h)  =  Q  on  Bn(R)  and  t  >  0,  we  have  the 
entropy  inequality 


log  w(L) 
•n..  (t^  +  ILI^MLI 


L.. 


logW,(£.) 

(t^ +ILI‘)ILI"-^ 


with  equality  if  and  only  if  p  -  W,. 


d£.=  1. 


Proof.  We  note  first  that 

^|(t^  +|L|^)|L|'>-' 

Therefore  we  obtain,  using  Jensen's  inequality  and  Lemma  51,  that 
exD  f  2t]ogMl)/W^ll))  ] 

^  \  2tw(£.) 


rf  2t  iogiwii)/w,ii);  j 

<  _ 2t_^) _ 

"  7ilS„l(t^  +  lLP)|Ll'’-'W,(Lj 

_ j  t 

71  (t^  f  ILI^lvi.dLI) 
luudLOI^vlDdLUuull'^ 

JlH” 

$[  luudLDl^dpIDIIuuir^ 

=  (T.  (Uit)^  ♦  uTT)  lluitir'^ 
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and  (5.2)  follows.  Since  equality  in  Jensen's  inequality  only  occurs  for  con¬ 
stant  functions,  an  equality  in  (5.2)  implies  that  Hs  =  0  and  w  =  Wt.  | 

Remark  5.3.  The  measure  |j.  in  Theorem  5.2  need  not  be  radial.  Let  us  also 
mention  that  some  of  the  computations  involving  Uit  in  the  proof  above  are 
a  bit  formal,  but  they  can  be  easily  justified  by  considering  a  sequence  in 
((0,R))  converging  to  uu  in  H. 
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1.  Introduction 

In  Sections  4  through  1 1  of  this  paper  we  study  the  phase  behaviour,  on  the 
unit  circle,  for  the  so  called  "ultraflat  unimodular  polynomials"  the  existence 
of  which  is  known  since  Kahane's  celebrated  1980  paper  [7].  Before  doing  so 
we  recall,  in  Sections  2  and  3,  some  definitions  and  historical  background. 

Throughout  this  paper,  the  implied  constants  in  the  0  notation  of  Lan¬ 
dau  are  understood  as  absolute.  A  notation  such  as  06  means  that  the 
implied  constant  depends  only  on  the  parameter  6. 

2.  Some  historical  background 

As  in  Littlewood  [9],  let  Sn  denote  the  class  of  those  polynomials  P(z)  = 
which  are  unimodular,  i.e.,  all  of  whose  coefficients  are  complex 
numbers  of  modulus  1: 

|ai,|  =  1  for  all  k  =  0, 1 , . . .  ,n. 

By  Parseval's  formula  |P(e'^'”®)|'^d0  =  n  -t-  1,  we  then  have  (for  n  >  1) 

min  |P(z)|  <  v/tTTT  <  max  |P(z)|.  (2.1) 

Ul  \  Ul  1 

An  old  problem  (or  rather  an  old  theme)  is  this: 

Problem  (Littlewood's  flatness  problem).  How  close  can  such  a  unimod¬ 
ular  polynomial  come  to  satisfying 

iP(z)|  s  v/n.  -t-  1  on  the  whole  unit  circle  |z|  =  1?  (2.2) 

We  insist  on  the  (obvious)  fact  that  (2.2)  is  impossible  if  n  ^  1.  So 
one  must  look  for  less  than  (2.2),  but  then  there  are  various  ways  of  seeking 
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such  an  "approximate  situation."  One  way  is  this:  in  1966  Littlewood  [9] 
conjectured  the  existence  of  unimodular  polynomials  of  arbitrarily  large 
degrees  which  are  flat  on  the  unit  circle,  that  is,  such  that 

B\/n  +  1  ^  |P(2)1  ^  A\/n  +  1  (whenever  |z|  =  1)  (2.3) 

where  A  and  B  are  positive  absolute  constants  (satisfying,  of  course,  0  <  B  < 

1  <  A).  We  do  not  know  who  coined  the  term  "flat,"  nowadays  commonly 
used  to  describe  those  P  €  Sn  vvhich  satisfy  (2.3):  it  was  not  mentioned 
by  Littlewood  [9],  but  became  customary  after  Korner  [8]  proved,  in  1980, 
Littlewood's  conjecture  on  the  existence  of  such  polynomials. 

Let  us  emphasize  that  the  important  aspect  of  the  above  conjecture 
of  Littlewood  (now  Korner's  theorem)  is  really  the  hnoer  bound  Bv^n  +  1 
in  (2.3).  Indeed,  if  we  just  require  polynomials  P  €  Sn  with  the  upper 
bound  condition 

max|P(z)|  ^  AvV+  1  (A  =  some  absolute  constant),  (2.4) 

Ul  I 

then  as  early  as  1914  Bernstein  [3]  proved  in  essence,  as  a  lemma  for  the 
study  of  absolute  convergence  of  Fourier  series,  that  the  polynomial 

n 

G(z)  =  ^  ‘  ■  z^"  (a  =  real  constant  0)  (2.5) 

k  0 

indeed  satisfies  the  upper-bound  inequality  (2.3),  with  A  =  A(q)  depending 
only  on  a.  See  Bari's  book  [Ij  for  a  simplified  version  of  Bernstein's  proof. 

In  particuliar  the  constant  A  becomes  absolute  in  the  case  of 

U 

G,(z)  =  z^  (2.6) 

k  0 

and 


G2(z)  =  Y_  z*-  =:  Gi(zc‘'’ 

k  0 


(2.7) 


The  polynomials  G(z),  G  i  (z),  and  G^lz)  are  often  called  Gauss  polynomials 
because  of  their  obvious  connection  with  Gauss  sums.  Since  Bernstein's 
early  work,  various  examples  of  P  t  Sn  satisfying  (2.4)  have  been  found 
and  much  research  has  been  done  on  them.  (For  an  account  of  some  of 
the  work  done  till  the  mid  1960's,  .see  Littlewood's  book  [10,  pp.  25-32). 
A  fairly  complete  account  of  this  topic  alone  would  require  a  respectable 
expository  paper.  Yet,  in  this  paper,  I  will  resist  the  temptation  to  digress 
into  any  example  other  than  the  already  mentioned  Gauss  polynomials  and 
the  example  given  by  relation  (2.10)  below. 
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What  led  to  Korner's  existence  proof  for  (2.3)  has  a  rather  interesting 
history.  It  was  known  to  Littlewood  [9]  that  the  (special)  Gauss  polynomials 
Gi(z)  and  G2(z)  defined  by  (2.6)  and  (2.7)  have  the  following  surprising 
properties:  for  any  6  with  0  <  6  <  V2,  we  have 

|Gi(e'Mi  =  \/n.+  1  +  05(n*)  outside  0  $  |tl  ^  rL“*  (2.8) 

and  (equivalently)  a  similar  estimate  for  |G2(e“  )|,  but  we  also  have 

min|G,(e'‘)l  =min|G2{e“)|  =  Os(Tt*).  (2.9) 

I  t 

Thus,  because  of  (2.8),  the  Gauss  polynomials  Gi  and  G2  almost  satisfy 
Littlewood's  condition  (2.3)  with  nearly  optimal  constants  but,  because  of 
(2.9),  they  just  fail  to  satisfy  the  lower  bound  condition  in  (2.3).  In  his  1977 
paper  [4],  Byrnes  proved  that  the  unimodular  polynomial  (of  degree  n'^  -  1) 

n  —  1  n  —  1 

B(z)  ■■=  X.Y.  (cu  =  (2.10) 

k  0  r  0 

has  properties  remarkably  similar  to  those  of  Gi  (z)  and  G2(2)-  In  particular 
he  proved,  for  B(z),  estimates  somewhat  sharper  than  (2.8)  and  (2.9),  with 
much  simpler  proofs,  and  also  extended  his  estimates  to  arbitrary  degrees.  In 
addition  Byrnes  used  the  same  method  to  prove  that  by  suitably  "perturbing" 
0(n’  ■*)  terms  of  P(e“)  for  some  P  €  Sn/  one  obtains  a  function  f(t)  such 
that,  for  every  real  t, 

lf(t)|  =  v^n  +  1  +  0(n'  ■*)  (uniformly  in  t  and  n).  (2.11) 

Then  Kdrner  [8]  proved,  via  a  modification  of  Byrnes's  last  construction 
together  with  the  use  of  a  probabilistic  idea,  the  existence  of  some  P  t  Sn 
satisfying  (2.3). 

3.  Ultraflat  polynomials 

In  the  same  1966  paper  [9],  Littlewood  had  also  suggested  that,  conceivably, 
there  might  even  exist  a  sequence  (P,„)  of  polynomials  in  Sn  (possibly  even 
with  coefficients  all  equal  to  +1)  such  that  (n  +  1)"‘'  '''^|P(e'‘)l  converges  to 
1  uniformly  in  t.  We  shall  call  such  sequences  of  unimodular  polynomials 
"ultraflat."  More  precisely,  we  shall  give  the  following  definition: 

DeHnition.  Given  a  sequence  (e„ )  of  positive  numbers  tending  to  zero,  we 
shall  say  that  a  sequence  (Pm )  of  unimodular  polynomials  is  (cn  )-ultraf}at  if 
degP  in  ^  00  as  m  -)  oc  and  if,  for  |z|  =  1, 


(1  -  en)>/n+  1  $  |Pm(z)l  ^  (1  +  en)'/n+  \ 
(where  n  =  deg  Pm) 


(3.1) 


or,  equivalently. 


max  ||P{z)|  -  \/nTT|  $  CnV^iTT  (n  =  degPm).  (3.2) 

In  looser  terms,  we  shall  simply  say  that  a  unimodular  polynomial  P  (z) 
of  large  degree  (i.e.,  going  to  infinity)  is  ultraflat  if  P  =  Pm  for  some  m,  where 
(Pm)  is  an  (cti  )-ultraflat  sequence  for  some  suitable  (en)  tending  to  zero. 

Despite  Korner's  above-mentioned  result  on  the  existence  of  "flat" 
unimodular  polynomials,  the  existence  of  ultraflat  unimodular  polynomials 
seemed  very  unlikely,  in  view  of  a  1957  conjecture  of  P.  Erdos  (problem  22  in 
[5])  asserting  that,  for  all  P  €  Sn  with  n  ^  1, 

max|P(z)|  ^  (1 -I- Clv/nTT  (3.3) 

Ul  =  i 

where  C  is  some  positive  absolute  constant.  Yet,  shortly  after  Korner's  proof, 
Kahane  [7]  further  refined  Korner's  method  and  proved  that  there  exists  a 
sequence  (Pn)nsi,  with  Pn  €  Sn,  which  is  (cnl-ultraflat,  where 

e„  =  O  (n“''''^v/logn)  .  (3.4) 

Thus  theErdos  conjecture  (3.3)  was  disproved  (in  the  case  of  the  class  Sn  )• 
For  the  more  restricted  class  Jn  of  those  P(z),  all  of  whose  coefficients  are  ±  1; 
the  analogous  Erdos  conjecture  remains  unsettled  to  this  date  (end  of  1991). 
We  conjecture  that,  for  the  ±1  polynomials,  it  is  true,  and  consequently  we 
conjecture  that  there  are  no  ultraflat  polynomials  with  only  ±  1  coefficients. 

Some  additional  remarks  on  Kahane's  breakthrough  are  made  in  Sec¬ 
tion  12.  For  the  moment  let  ’is  insist  that  the  ultraflat  polynomials  P  €  Sn 
whose  phase  behaviour  we  shall  be  studying  below  are  not  necessarily 
those  of  Kahane's  paper  [7]:  we  shall  consider  arbitrary  ultraflat  polyno¬ 
mials  P  e  Sn,  only  assumed  to  satisfy  (3.1)  or  (3.2)  for  some  sequence  (€„) 
tending  to  zero. 


4.  The  phase  problem:  the  main  conjectures 

We  shall  henceforth  suppose  that  P  €  Sn.  (n  -+  oo),  is  (cn  )-ultraftat,  that  is, 
satisfies  (3  1).  Write 

P(e‘')  =  R(t)e‘“'‘'  with  R(t)  =  |P(e“)l.  (4.1) 

We  think  of  t  as  time.  The  ultraflatness  condition  (3.1)  means  that  the 
mobile  point  P(e'')  moves  inside  a  narrow  annulus  centered  at  the  origin 
and  of  inner  (resp.  outer)  radius  (1  -en)\/nTT  (resp.  (1  -I-  Cnlv/nTT).  Our 
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purpose  is  the  phase  problem,  i.e.,  the  study  of  the  phase  a(t),  or  rather  the 
(instantaneous)  angular  speed  a'(t).  Writing 

n. 

^  exp(ikt  +  i0k)  (0k  €  (R  for  all  k), 

k=0 

we  see  that  wehave  n-l- 1  unit  vectors  whose  endpoints  exp(lkt  +  i0k)  rotate 
along  the  unit  circle  with  (respective)  constant  angular  speeds  0, 1 , 2, . . . ,  n. 
That  P{z)  is  ultraflat  is  equivalent  to  saying  that  there  is  a  choice  of  the  initial 
positions  exp(i0k ),  (k  =  0, 1 , 2, . . . ,  n)  so  that  the  resultant  vector  has  end¬ 
point  P(e'*)  moving  in  the  above-mentioned  narrow  annulus.  Our  intuition 
tells  us  (or  at  least  mine  did,  when  1  considered  the  problem)  two  things. 
First  that,  since  the  "components"  exp(lkt  -t-  i0k)  have  (respective)  angu¬ 
lar  speeds  0, 1 , 2, . . . ,  n,  then  the  "resultant  angular  speed"  is  approximately 
their  average;  in  other  words  we,  might  expect  to  have 

a'(t)  =  n/2-^o(nl.  (4.2) 

We  shall  see  that  (4.2)  is  trivially  true  in  average,  that  is, 

cc'lt)dt=n/2  +  0(nen]  (4.3) 

but  that  (4,2 )  itself  is  far  from  being  true.  Indeed  we  shall  prove  (Theorem  5.2) 
that  a'(tl  takes  values  at  least  as  large  as  2n/3  4-  0{ne„ )  and  as  small  as 
n/3  +  0(n  Cn  )- 

Secondly,  our  intuition  tells  us  that,  since  all  the  components  exp(ikt  + 
i0k)  turn  counter-clockwise,  then  so  does  their  resultant  P(c"),  modulo 
negligible  fluctuations;  in  other  words, 

min  a'(t)  ^  o(n).  (4.4) 

O  t  ^  in 

Now  (4.4)  is  indeed  true:  we  shall  prove  (Theorem  5.2)  that 

0(nen  )  $  a'(t)  ^  n  +  0(n€n).  (4.5) 

Actually  we  conjecture  that 

min  a'(t)  =  Olncn ),  max  a'(t)  =  n -t  0(ne„  ),  (4.61 

OtJtSin  OSsl-siTi 

and  even  something  much  more  specific,  namely  that  the  normalized  angular 
speed  a'(t)/n  is,  asymptotically,  uniformly  distributed  in  fO,  1).  (The  precise 
definition  is  given  below). 

At  this  point  let  us  formulate  the  main  conjecture  and,  in  Section  5,  our 
partial  results  that  support  it.  (Recall  that  P  6  Sn  is  supposed  (€„  )-ultranat, 
i.e.,  that  (3.1 )  holds). 
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Conjectiue  (Uniform  distribution  conjecture  for  the  angular  speed).  In 
the  interval  0  $  t  $  27r,  the  distribution  of  the  normalized  angular  speed 
a'(t)/n  converges  to  the  uniform  distribution  as  n  — >  oo.  More  precisely,  for 
any  x  6  [0, 1]  we  have 

meas{t  t  fO, 27tl :  0  ^  a'{t)  $  nxj  =  27tx  +  0(en).  (4.7) 

For  the  special  ultraflat  polynomials  produced  by  Kahane  [7],  (4.7)  is  indeed 
true.  (See  [11]). 

In  the  general  case  (4.7)  can,  by  integration,  be  reformulated  '^equiva¬ 
lently)  in  terms  of  the  moments  of  the  angular  speed  a'(t)'. 

Conjecture  (Reformulation  of  the  uniform  distribution  conjecture).  For 

any  q  >  0  we  have 


‘2n 

|a'(t)|‘'dt  = 
.  c 


+  0(n‘’e„). 


We  also  have  the  following  closely  related  (and,  in  some  sense,  stronger) 
conjecture.  It  says  that  the  angular  acceleration  a''(t)  and  the  other  higher 
derivatives  of  alt)  are  L^-negligible,i.e.,  have  very  small  norms  on  10,27t1: 

Conjecture  (Negligibility  conjecture  for  higher  derivatives).  For  every 
integer  r  5  2,  the  derivative  a'*  '  of  order  r  satisfies 

I 

—  |a'^*(t)i^dt  =  Or(n^^e„).  (4.9) 

Jo 

One  can  prove  that  (4.9)  implies  (4.8),  but  with  error  Oi,(n"e„  ).  The  above 
conjectures  (4.7),  (4.8)and  (4.9)  suggest  that,  very  roughly  speaking,  the  unit 
circle  can  be  divided  into  a  small  number  of  arcs  ("small"  meaning  bounded 
or  O5  (n* )  for  all  6  >0)  such  that,  on  each  such  arc  S,  the  circular  motion  of 
glad  I  jg  either  (approximately)  uniformly  accelerated  or  uniformly  retarded; 
in  other  words,  the  angular  speed  might  have  the  form 

a'(t|  ^  ant -I- b  +  v(t)  (for  t  S)  (4.10) 


where  a  =  a(S)  and  b  =  b(S)  are  "constants"  depending  only  on  S,  and 
v(t)  =  v(S,t)  is  a  "noise  term"  all  of  whose  derivatives  have  negligible 
norms. 

For  the  special  ultraflat  polynomials  produced  by  Kahane  [7],  the  phe¬ 
nomenon  (4.10)  is  indeed  the  case.  (See  [11]). 

The  results  we  state  in  the  next  section,  and  prove  in  Sections  6  through 
1 1 ,  partially  confirm  the  above  conjectures. 
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5.  The  phase  problem:  statements  of  results 


In  this  section,  we  state  some  results  which  will  be  proved  in  Sections  6 
through  11.  We  keep  all  the  notation  of  the  previous  sections.  In  particular 
P  €  Sn  is  (Cn  )-ultraflat. 


Theorem  5.1.  Let  C  denote  the  (closed)  trajectory  of  P(e*‘)  as  t  runs  over 
[O.Zti].  Then  the  average  value  of  the  angular  speed  a'(t)  on  [0,27t1  (which 
is  also  the  winding  number  of  C  with  respect  to  the  origin)  is  given  by 


271 


a'(t)dt  =  — 
0  2n 


eitp/(eit  j 

0  “P(^ 


n 


+  0(ne„; 


(5.1) 


Theorem  5.2.  We  have 

O(nen)  ^  min  a'(t)  ^  n/3 -t- 0(ne„  ) 

OS.IS27I 

and 


2n/3  +  0(nc„)$  max  a'(t)  ^  n  +  Ofncn ). 

0^.- 1  2  n 


(5.2) 


(5.3) 


Remark  5.3.  If  the  "uniform  distribution  conjecture"  (4.7)  is  true,  then  it 
immediately  follows  that  (5.2)  and  (5.3)  can  be  improved  to 

min  a'(t)  =  O(nen)  and  max  a'(t I  =  n  +  0(ne„  ).  (5.4) 

C'5ts2n  tn 


Theorem  5.4.  Put  |!ip||(,  =  !(p(t)|‘'dt^  if  (p  e  L‘*(0,27t)  and  q  1. 

Then 


lla'Ili  =  'T.'^/3  +  0(n"e„  ) 


(5.5) 


and 


+l|a"|li  =nV5  +  0(n‘’e,.).  (5.6) 

We  see  that  the  relations  (5.1),  (5.2),  (5.3)  and  (5.5)  are  all  partial  con¬ 
firmations  of  the  "uniform  distribution  conjectures"  (4.7)  and  (4.8).  Relation 
(5.6)  is  also  consistent  with  the  conjectures  (4.8)  and  (4.9).  Indeed,  (4.8)  and 
(4.9)  would  respectively  imply 

l|a'||4  =  n‘’/5  +  0(n‘*e„ )  (case  q  =  4  of  Conjecture  (4.8))  (5.7) 
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||a"ll2  =  0(n‘*  en)  (case  r  =  2of  Conjecture  (4.9))  (5.8) 

and  we  see  that  theorem  (5.6)  is  obtained  by  adding  up  the  conjectural  rela¬ 
tions  (5.7)  and  (5.8). 

Now  (5.5)  and  the  trivial  inequality  ||a'l|4  ^  llct'|l2  imply; 
lice'll^  ^nV9  +  0(n4en)  (5.9) 

and,  equivalently,  in  view  of  (5.6) 

||a"(li  ^  ^n'*+0(n'‘€„).  (5.10) 

Of  course  (5.9)  and  (5. 10)  are  very  poor  compared  to  the  optimal  (conjectural) 
relations  (5.7)  and  (5.8).  But  one  further  "partial  confirmation”  of  our  above 
conjectures  is  the  following  improvement  of  the  trivial  inequalities  (5.9) 
and  (5.10): 

Theorem  5.5.  There  is  an  (effectively  computable)  absolute  constant  y  >  0 
such  that  the  trivial  inequalities  (5.9)  and  (5.10)  can  be  improved  to  the 
respective  (equivalent)  relations 

lla'ld  $  n.'* -t- 0(n‘'e„)  (5.11) 

and 

||a"ll2  ^  -y)  n,”*  +  0(n‘’en).  (5.12) 

In  Section  11  we  shall  prove  the  existence  of  such  a  constant  y,  but 
will  not  compute  a  numerical  value  for  it,  because  further  refinements  of  our 
present  method  (to  be  written  out  later)  will  provide  better  numerical  values. 

6.  Some  preliminary  estimates 

We  first  show  that  the  sequence  (t,, )  in  the  flatness  condition  (3.1 )  necessarily 
satisfies 

Cn  ?  n"'  +  0(n'”'^).  (6.1 ) 

(This  can  be  improved,  but  all  we  need  here  is  to  know  that  e„  ^  Kn  '  for 
some  absolute  constant  K  >  0).  To  prove  (6.1 ),  write 

n 

f(t)  =  |P(e‘')l^  =  X  Cke‘'‘' 

k  — n 


(c-k  =  Ck) 


(6.2) 


The  phase  behaviour  of  ultraflat  unimodular  polynomials  } 


{  563 

and  note  that  co  =  n  +  1  and  Cn  =  aoQn,  so  Icnl  =  1-  Also  write 


g(t)=lP(e'‘)l^-(n  +  1)=  ^ 


and 


H(t)  = 


=  +  CnC  =  2  COs(nt  +  <p) 

where  Cn  =  e'‘<’.Thenby  (6.3),  (6.4)  and  (3.1), 
2  =  ||h|loo  ^  IlglU  ^  (TV  +  l)(2en  +  e^), 


(6.3) 


(6.4) 


(6.5) 


whence 

en^M  +  ^j  -1=n-’+0(n-^)  (6.6) 

which  proves  (6.1 ). 

In  all  the  0  estimates  below,  we  are  using  the  fact  that,  by  (6.1 ),  uni¬ 
formly  bounded  functions  are  0(n  ).  First  rewrite  (3.1 )  in  the  cruder  form 

R(t)  =  \/n  +  0(\/nen)  where  R(t)  =  |P(e’‘)|.  (6.7) 


We  shall  also  need  estimates  for  R'(t)  and  R"(t).  (In  the  rest  of  this  paper  we 
supposenlarge  enough  so  thatP(2:)has  no  root  on  the  unitcircle;  P(e'*)  ^  0, 
so  R(t)  >  0  and  f(t)  >  0).  Once  again,  by  (3.1)  or  (6.7), 


f(t)  =  n  -t-  O(nen)  and  g(t)  =  O(nen), 


so  Bernstein's  inequality  applied  to  the  trigonometric  polynomial 
g(t)  implies 

llf’iu  =  llg’lU  =  0(n^e„)and  If’IU  =  ||g''lloc  =  0(n-’e„). 
Now  R(t)  =  \/f(t),  hence 


R'(t)  = 


f'(t) 

2yf(t) 


and  similarly 


O(n^en) 

2v/n-l-  0(v/n-  €n) 


=  0(n^^^€n) 


2f(t)f'(t)  -  (f  (t))^ 

4(f(t))V2 

= _ _ =  o(n5''^e 

4n3/2  +  0(n-V2e„) 


(6.8) 


(6.9) 


(6.10) 


(6.11) 
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7.  Proof  of  Theorem  5.1 


Differentiate  (4.J  ); 


ie“P'(e‘')  =  R'(t)e‘“’"  +  ia'(t)R(t)e‘“'" 


whence 


e“P'(e‘')  ,  .R'(t) 


Since  a'(t)  is  real,  on  integrating  (7.2)  over  [O.Zti]  we  have 


1  f'^"’  1  e'‘P'(e'' ) 

-T  «'(tldt  =  ^  ^-57^dt 

27t  Jo  P(e") 


(winding  number  of  C  around  the  origin).  Of  course  ( 7.3 )  is  a  well  known  el¬ 
ementary  result.  We  now  prove  (5.1 ).  By  (3.1 )  and  by  Bernstein's  inequality 
applied  to  our  (e„  )-ultraflat  polynomial 


p(z)  =  Y_  S'" 


we  have 


e'<p'(e“|P|e»l  =  O(n^). 


hence  by  (6.1 ),  (6.8),  (7.3)  and  (7.4), 


'  '  r'^c’'P'(e‘')P(e-) 

(X(t)dt=  :—  - - -  dt 

2n  Jo  2n  Jo  n  +  0(nfc„) 


-  ~  f  c‘'P'(e‘')P(ed)dt  4-  0(nt-„ 
n  In  Jo 


-  y  kluKl"^  4  0(ne„  | 

n 


-  ■  4-  0(ne„)  =  ?  +  0(nt'„  1 

n  2  2 


which  proves  (5.1 )  and  Theorem  5.1. 
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8.  Proof  of  part  of  Theorem  5.2 

Here  we  only  prove  the  inequalities 

O(rLen)  ^  min  a'(t)  $  max  a'(t)  $  n  +  0(ne,i).  (8.1) 

05^t$2Tr  0$t$27T 

(The  proof  of  the  rest  of  Theorem  5.2  uses  Theorem  5.4,  and  will  be  given  in 
Section  10.  By  (6.7),  (7.2)  and  Bernstein's  inequality  applied  to  P,  we  have 


(x'(t)  =5Be 


e''P'(e' 


V  P(e'‘)  J 

+  0(n-’''^e„) 

^  r- - T  =n-  +  0(Tien) 

v/n  +  0(v/ne„ ) 

which  proves  the  rightmost  inequality  in  (8.1 ).  Now  consider  the  "inverse" 
P’  of  P,  i.e.,  the  polynomial 

n 

p*(z)  = 

1.  0 

Obviously  P*  e  Sn  and  is  also  (e„  )-ultraflat,  since  |P'(z)|  =  |P(zllon  the  unit 
circle  |z|  =  1.  By  (8.3), 

P'(e'' I  =  R(t)e''’‘-'"'".  (8.4) 

So,  by  applying  the  conclusion  of  (8.2)  to  P‘,  with  a(t )  of  course  replaced  by 
nt  -  a(t).  we  have 

n  -  a'(t)  ^  n  +  0(n.e„ ) 

which  proves  the  leftmost  inequality  in  (8.1 ). 


9.  Proof  of  Theorem  5.4 

By  (7.1 )  we  have 

|P'(e“)l^  =  |R(t)a'(t||^  +|R''t)|^. 
Now,  by  (7.4), 


|P'(e‘'  ll^dt  =  Y_  =  n(n  +  1  )(2n  +  1 1/6 


while,  by  (6.7)  and  (6.10), 


1 

—  {|R(t)«'(t)|^4  |R'(t)|^)dt-n(1  +0(en))-||a'||i  +  0(n'e^l 

Z7T  J(^ 


Comparing  the  last  two  equalities  yields  (5.5).  To  prove  (5.6),  differentiate 
again  (7.1 )  and  then  take  the  squares  of  the  moduli  of  both  sides,  to  obtain 

|P'(e“)  +  e“P"(e“)l^ 

=  |R''(t)  -  (a'(t))2R(t)|^  +  |2a'(t)R'(t)  +  a"(t)R(t)P. 


In:egrate  both  sides  of  (9.2)  over  [0,27rl  (with  respect  to  dt/27t).  By  (7.4),  the 
integral  of  the  left  side  is 


1 


27r 


In 


k  1 


Y_ 

k  -0 

while  that  of  the  right  hand  side  is 

(||a'||^  +  ||a''||i).n(l+0(en))  +  0(n5e,.) 


(9.3) 


(9.4) 


in  view  of  (6.7),  (6.10),  (6.11),  (8.1)  and  the  estimate  a''(t)  =  O(n^)  which 
can  be  obtained  by  differentiating  (7.2)  and  by  using  Bernstein's  inequal¬ 
ity.  Comparing  (9.3)  and  (9.4)  yields  (5.6)  and  completes  the  proof  of 
Theorem  5.4. 


10.  End  of  proof  of  Theorem  5.2 

We  adopt  an  idea  already  used  in  [12].  Write 

min  a'(t)andM=  max  «'(t).  (10.1) 

ostein  05t$2n 

To  finish  the  proof  of  Theorem  5.2,  it  remains  to  show  that 

$  n/3  -k  0(n€„)  and  M  $  2n/3  +  O(nen)-  (10.2) 

Put  il;(t)  =  a'(t)  -  ^  for  0  ^  t  $  27t.  Then,  by  (5.1),  (5.5)  and  (8.1 ), 


271 


In 


|;(;(t)|^dt  =  lla'Ilj  +  _  2u  ■  ^  |  "  a'(t)dt 


=  n^/3  -I-  -  nn  +  0(n^en ! 

Now,  since  0  ^  <|i(t)  $  M  -  n,  we  have  by  (5.1 ): 


(10.3) 


^  p|i|;(t)|^dt  5  (M-d).^  f%(t)dt 

271  Jc  271  Jp  (10.4) 

=  (M  -  n)  (n/2  -  n  +  0{ne„)) . 

Comparing  (10.3)  and  (10.4),  we  have  again  by  (8.1 ) 

nV3-nn/2^  (n/2  -  n)M  +  0(n^€„ )  (10.5) 

and  since  by  (5.1)  we  have  n  $  n/2  +  O(nen)  ^  M,  (10.5)  implies  (10.2) 
and  completes  the  proof  of  Theorem  5.2. 
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1 1 .  Proof  of  Theorem  5.5 

The  proof  is  based  on  the  following  lemma. 

Lemma.  (Concentration  on  low  degrees  for  real  trigonometric  polynomials 
with  small  L"*  norms).  For  every  6i  >  0  and  62  >  0,  no  matter  how  small, 
there  exists  an  effectively  computable  e  —  £(61,62)  >  0  and  an  integer 
mo  =  mo(6i ,  62)  such  that  whenever  a  real  trigonometric  polynomial 

TTt 

F(t)  =  ^(Akcos  kt  +  BkSin  kt) 
k  0 

satisfies  m  ^  mo  and 

IlflU  $  (1 +e)llF||2,  (11.1) 


then 


^  (A^+Bj)^62^(A^k  +  B^). 

W.6 1  in  k  0 


in.2) 


Proof  (of  the  lemma).  It  is  more  convenient  to  rewrite  F(t)  in  the  form 

m 

F(t)  =  ^  bke‘^'  (bk  €  (!:,b_k  =  bk) 

k  —in 

and  put  Fp(e'‘ )  =  bkp‘'e'‘'*  wherep  =  c^'"  and  A  is  a  real  parameter 

at  our  disposal.  Write  also: 

2  m 

(F(t))^  -  dhc''"  (d-h  =  dh  for  all  h) 

h  —2m 

Then  do  =  m,"  Ibkl^  =  i|F||^  and 

2  in  2  m 

l|F||^=  Y_  Idhl^  =  d^+ 

h  -2m  h  1 

Thus  (11.1)  can  be  rewritten  as 

2m 

2  Idhl^  $  Cl  do  with  ci  =  (1  +  e)"*  -  I. 

h  I 


(11.3) 
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Now  (Fplc"  ))“ 


,  dhp''e'''',  whence 


h  -im 


df  +  2  ^  !dhl^  (p*''  -T  p  ■’*'  ) 


■.  d:,  .  2(p- 


h  ] 

?  { 


£!d„!- 


J;  d;;  1 1  ^  2cicoshl'IA)l 
iissuming  (11.1)  nnd  therefore  (11.,? ).  Also 


"’1.  ^  P' 


L  K 

1.  tn 

hi,  -  ■  p-’‘ 

k  •IM.'', 

-  X.  h, -Ip-'*-  .  p--*-;  2 

•M.S, 

Y_  iH  ’j  IP-"'-''  .  p 

y~  hi,  ■  \  coshl2\ei  I, 


Bv  i  1 1 .41  .iiul  1 1  1 .3  ^  \vr  st*t*  that  (III!  nTiplio> 

( 


,  /, 
/ 


\ 


L 


1 1  '  2.  1  coshl'lA  1 1 
cosh 1 2A^ I  I 


S6S  } 


111.4) 


1 1  l.fii 


F'or  ii7iy  fixed  .  ■  0  .ind  .inv  fixed  (0  •  ■  II.  the  right  h.ind  side 

of  1 11. hi  tends  to  infinilv  as  \  ‘  x,  therelore,  as  -\  caries  in  25.  the 

right  hand  side  of  1 1 1  .hi  has  a  minimum  M(c,  h,  I.  For  fixed  hi ,  we  ha\  e 
limj  .cMli  .eil  d. S(>,  lor  any  fixed  6..  -Oandanvc  ■  Osiifm  u’lillu --niiil! 
the  right  hand  side  ol  Ml.hlis-  h..  lor  some  choice  of  A  A(c,ai,h.'i  This 
completes  the  proof  ol  the  lemma  | 

Proof  (of  Theorem  5.5).  The  idea  Is  to  use  the  tact  that 

a'(t)  fit  I  '  ()(n.  „  I  and  a"li  I  F'lt  I  »  llln*.  „  i  111,7) 


t 
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where /i£>reF(t)  is  the  trigonometric  polynomial  n“'  iHe(e“  P'(e“  )P(e'‘  )),and 
to  apply  the  lemma  to  this  F,  (of  degree  m.  =  2n).  The  two  estimates  ( 11 .7) 
follow  from  (7.2)  and  the  error  estimates  of  §  6  via  Bernstein's  inequality. 
The  proof  of  Theorem  5.5  is  by  contradiction. 

Suppose  Theorem  5.5  is  not  true.  Then,  for  any  fixed  6 1  >  0  and  62  >  0, 
and  any  6  with  0  <  6  <  e(6i.&2)  (where  e  =  £(61,62)  is  that  of  the  lemma! 
and  any  sufficiently  large  n,  we  have  the  (equivalent)  inequalities 

ila'Ili  $  +6^  n-* +0(n-'e„) 

=  (1  +96)j|a'|il  +  0(n'^e,.) 

and 

i|a"i:2  -  6  j  n'*  ^  0(n'*c„  ).  ( 11.9) 

For  sufficiently  small  6  and  sufficiently  large  m  -  2n.  we  have,  by  (1 1.7), 
(1 1.8)  and  the  lemma, 

!jF'iii  =  £k^(A^  +  B^)=  ^  H 

k.  1  k*  tiuSi  V.  •  fiuS  I 

$  m^6t  Yi  +  BJ)  *  tn^  Y.  <At  +  BF) 

k*.  tn5t  k  11161 

s:  T..'^6f  !iF|!2  +  m‘62!|F||5  4n^(6'|  -r  62  )!!F||j 

and  therefore,  by  1 1 1.7)  again, 

'a''r2  $  +  62  Hia'llj  +  0(n'*£M  ) 

-  4(5-)  -V  62ln-'/.?  0(n\-n). 

which  contradicts  (11.9)  tor  sufficiently  small  6.  and  .Sj.  This  proves 
Theorem  5.5  | 

12.  Miscellaneous  remarks 

Remark  12.1.  In  the  fall  of  197^^  J.-P.  Kahane  -tve  me  a  preprint  of  his 
paper  (7)  (1  already  had  a  preprint  t)f  Kt)rner  18]).  I  immediately  coined 
the  term  "Bat"  for  Kdrner-type  polynomials  and  "ultraflat"  for  Kahane-type 
ones.  But  afterwards  1  heard  exactly  these  same  terms  from  such  unrelated 
sources  that  obviously  several  people  must  have  coined  these  same  words 
at  the  same  time' 


Remark  12.2.  The  (a  priori  arbitrary)  (en)-ultraflat  polynomials  P  €  Sn 
have  hidden  and  interesting  properties  worth  studying.  (Those  produced 
by  Kahane  [7]  are  excellent  models  tor  testing  ideas  and  conjectures).  Let 
P  €  Sn  be  (cn  )-ultraflat,  let  P*  e  Sn  denote  its  "inverse"  and  *P  €  Sn  its 
"reverse"  ( cf.  (8.3)1: 

n 

P*(2)  =  ^  =  Z-POT^) 

V,  0 
n 

•P(z)  =  ^On-kz'^  =2’'P(1/Z). 

k  0 

Clearly  both  P’  and  *P  are  (e„  )-ultraflaL  and  we  conjecture  the  toHowing 
"near  orthogonality"  properties  of  the  pairs  (P,  P*)  and  (’P,  P): 

n 

^Qka„_k=o(n)  (as  n ->  oo) 

k  c 

n 

^akQn-k=o(n)  (as  n -*  oo). 

k  0 

Conjectures  (12.1)  and  (12.2)  are  much  stronger  than  are  the  respective 
statements: 


(12.1) 

(12.2) 


1)  "Pisnovvhere  near beingself-inversive"  (Pself-inversive means  P  -  P', 
that  is,  a„_k  =  Ok  for  all  k). 

2)  "P  is  nowhere  near  being  symmetric  (or  palindromic)"  (P  symmetric 
(or  palindromic)  means  P  =’  P.  that  is,  On-k  =  Qk  for  all  k). 

The  truth  of  Item  1  is  a  consequence  of  our  paper  [6].  In  fact,  a  trivial  corollary 
to  the  results  of  [6|  is  that,  if  P  t  Sn  is  ultraflat,  then 


11 

y  aka„-k  5: 

k  0 


,5 

-n  +  o(n) 
b 


(n  — >  00) 


(12.3) 


and  I  can  further  decrease  the  constant  5/6  in  (12.3)  (though  1  am  unable  to 
replace  it  by  zero,  otherwise  I  would  have  a  proof  of  (12.11).  But  (2)  is  an 
open  problem,  and  so  is  its  following  consequence: 

Conjecture  (Weak  form  of  Item  2  of  Remark  12.2).  An  ultraflat  P  <:  Sn 
cannot  be  symmetric  (or  palindromic),  i.e.,  we  cannot  have  P  '  P. 


Remark  12.3.  Here  is  another  type  of  open  problem. 

Conjecture.  Let  E  be  any  fixed  subset  of  the  unit  circle  which  is  not  ev¬ 
erywhere  dense.  Then  there  cannot  exist  ultraflal  polynomials  all  of  whose 
coefficients  are  in  E . 
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It  would  be  most  desirable  to  have  a  proof  of  this  conjecture  for  E  finite, 
or  even  when  E  is  the  group  of  d-th  roots  of  1  for  some  d  ^  2.  (This  is 
not  known  for  any  d  ^  2.  The  case  d  =  2  is  that  of  the  ±1  polynomials). 
But  J.  Beck  [2]  has  combined  the  existence  of  Kahane's  ultraflat  polynomials 
with  other  ingenious  ideas  to  give  a  proof  of  the  following  improvement  of 
Korner's  theorem: 


Theorem  12.4  (Beck).  [2],  For  fixed  d  ^  300 and  sufficiently  large  n,  there 
are  polynomials  (of  degree  n)  all  of  whose  coefficients  are  d-th  roots  of  1  and 
which  satisfy  Littlewood's  condition  (2.3). 


Remark  12.5. 


Conjecture  (A  very  general  one  (containing  the  previous  one)).  Let 

E  c  (E,  E  ^  {O’,  be  such  that  the  closure  E  of  E  contains  no  circle  of  centre 
origin  and  radius  >  0.  (For  example,  any  finite  set  not  reduced  to  the  origin.) 
Let  P  be  any  polynomial  all  of  whose  coefficients  are  in  E  and  having  at  least 
two  nonzero  coefficients.  Put 


iiPII. 


■  271  \ 

|P(e’Ml‘'dtJ 


I /q 


if  —  oo  <  q  <  oo  and  q  0 


|P||^  =  max  |P(e'ML  \\P\\-^  ^ 

0'S:  t  CiTI 


|P||o  =  exp 


27T 


271 


log|P(e‘')!dt 


min  |P(e“)l 

0$tC27i 


I  conjecture  that,  whenever  -oo  $  p  <  q  $  oo.  we  have 

where  C(p,a  i  >  0  depends  only  on  p  and  q. 


Remark  12.6.  There  are  several  other  related  matters,  in  particular  some 
interesting  problems  of  D.j.  Newman  about  ultraflat  polynomials.  1  shall 
deal  with  them  elsewhere. 


13.  Last  minute  addendum 

A  few  days  after  submitting  this  paper,  I  made  these  three  observations: 

1)  Theorem  5.5,  as  it  is  stated,  is  not  interesting  unless  the  actual  value 
of  Y  is  computed.  Indeed,  Holder's  inequality  implies  the  statement 
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of  Theorem  5.5  with  y  =  1/27.  But  in  a  forthcoming  paper,  we  shall 
see  that  the  method  of  proof  of  Theorem  5.5  gives  a  constant  y  larger 
than  1/27. 

2)  One  can  obtain  ( 1 0.2)  in  a  straightforward  manner,  just  by  the  "degen¬ 
erate  Holder  inequality"  majorizing  the  2-norm  in  terms  of  the  1-norm 
and  the  sup-norm. 

3)  The  "negligibility  conjecture"  for  the  acceleration  alone  (case  r  =  2) 
implies  the  negligibility  conjecture  for  all  r  >  2,  because  of  Bernstein's 
inequality,  and  the  approximation  relation  (11.7)  and  its  analogues  for 
higher  derivatives.  It  would  therefore  suffice  to  settle  the  case  r  =  2. 
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^  Families  and  sequences  of  zero-crossing  counts  generated  by  parametric 
time  invariant  linear  filters  are  called  higher  order  crossings  or  HOC.  Be¬ 
cause  of  the  close  relationship  between  zero-crossing  counts  and  first  order 
autocorrelations,  families  of  first  order  autcKorrelations  are  also  referred 
to  as  HOC.  We  investigate  the  HOC  from  some  particular  families  of  lin¬ 
ear  filters  applied  in  the  problem  of  multiple  frequency  detection  in  noise. 
Viewing  the  cosine  of  each  discrete  frequency  as  a  fixed  point  of  a  certain 
mapping,  it  is  shown  how  to  construct  HOC  sequences  that  converge  to 
the  fixed  points.  A  faster  convergence  rate  is  achieved  by  controlling  the 
bandwidth  of  the  parametric  filters. 

1.  Introduction 

1.1.  The  general  idea  of  HOC 

In  genera!  when  a  filter  is  applied  to  a  time  series,  it  changes  the  series  mode 
of  oscillation.  Thus,  when  a  bank  of  filters  is  applied  to  the  same  series,  we 
obtain  a  sequence  or  family  of  oscillation  patterns.  The  resulting  family  of 
zero-crossing  counts  is  referred  to  as  higher  order  crossings  or  simply  HOC. 
The  corresponding  first  order  autocorrelations  are  referred  to  as  higher  order 
correlations,  or  simply  HOC  again.  Because  the  first  order  autocorrelation 
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and  the  expected  zero-crossing  rate  of  a  real  valued  stationary  time  series 
are  essentially  equivalent,  using  the  same  acronym  is  quite  tolerable.  The 
particular  HOC  under  consideration  should  be  clear  from  the  context. 

This  paper  shows  how  to  construct  convergent  HOC  sequences  for  the 
purpose  of  multiple  frequency  estimation  in  the  presence  of  ambient  noise. 

The  gist  of  the  idea  is  to  employ  HOC  sequences  in  the  fine  tuning 
of  parametric  filters.  This  is  done  iteratively  as  follows.  A  time  series  is 
filtered  by  a  parametric  filter,  and  the  resulting  first  order  autocorrelation  is 
immediately  used  in  adjusting  the  filter  parameter.  The  adjusted  filter  is  then 
applied  again,  giving  rise  to  a  new  first  order  autocorrelation,  and  the  pro¬ 
cedure  is  repeated.  By  choosing  the  filters  appropriately,  the  scheme  gives 
convergent  sequences  of  higher  order  correlations,  or  equivalently,  conver¬ 
gent  sequences  of  higher  order  crossings,  depending  on  what  one  chooses  to 
observe,  correlations  or  zero-crossing  counts.  From  a  statistical  point  of  view, 
under  appropriate  conditions,  the  method  guarantees  the  strong  consistency 
(almost  sure  convergence)  of  the  estimating  HCXI  sequences. 

To  express  the  same  idea  in  symbols,  let  {Zt  J,  t  =  0,  ±  1 ,  ±  2,  •  •  •,  be  a 
zero-mean  stationary  time  series,  and  let  (Cq)  ,  0  €  0,  be  a  parametric  family 
of  time  invariant  linear  filters.  Denote  by  'Z,  (0)]  the  filtered  series, 

Ztie)  =i:0(Z)t 

Then  {pi  (0)},  0  e  0,  defined  by 
iR{E[Zt(0)Z77^]| 

is  a  HCXT  family  defined  from  a  parametrized  first  order  autocorrelation. 
Here  and  elsewhere,  a  bar  denotes  complex  conjugate.  For  a  real  valued 
process  [Zi  (0)),  let  I|,^|  be  the  indicator  of  the  event  A,  and  define, 

N 

De  =  ^  1|Z,|0'Z,  ,(0i-.oi 

t  2 

as  the  number  of  zero-crossings  observed  in  Zi  (0),  Z2(0),  ■  ■ ,  Zn  (0).  This  is 
the  corresponding  HOC  family  from  zero-crossings.  When  ’Zt]  is  a  strictly 
stationary  pure  sinusoid,  or  when  {Zt }  is  Gaussian,  then 

11.11 

and  we  can  see  that  in  this  real  case  knowing  Pt  (0)  is  equivalent  to  knowing 
EjDe ).  The  right  hand  side  being  an  expected  rate,  is  independent  of  N.  More 
examples  that  relate  pi  to  E(D)  by  a  simple  formula  are  given  in  [1].  For 
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example,  let  {Zt]  be  a  real  valued  zero  mean  stationary  Gaussian  process, 
and  define 


Y.  =  Z’ 

Then  {Ytl  is  still  stationary  with  mean  zero,  but  it  is  no  longer  Gaussian. 
Let  Pi  (y )  be  the  first  order  autocorrelation  of  {Yt},  and  let  D(y )  be  the  zero¬ 
crossing  count  in  Yi ,  Y2,  ■  •  ■ ,  Yn-  Then, 


Pi(y)  =  IqCos 


j^nE(D(y))  j 


+  -COS 


/37rE(D(y))\ 

V  N  - }  ; 


Thus,  from  E(D(y ))  we  can  get  pi  (y ).  Going  in  the  reverse  direction  requires 
the  solution  of  a  third  degree  equation. 

Inspired  by  the  algorithm  presented  in  [2],  we  shall  be  concerned  with 
fixed  points  of  pi  (0)  obtained  from  the  recursion. 


ejM=P;(ej)  (1.2) 

for  some  specific  families  of  parametric  filters.  As  it  turns  out,  by  controlling 
the  filter  bandzvidth,  the  fixed  points  can  be  made  to  coincide  xvith  the  cosines  of  the 
frequencies  i  i  the  discrete  spectrum  of  {Zi ). 


1.2.  The  problem 

Consider  the  mixed  spectrum  model  for  t  €  ',0.  ±  1 ,  ±2,  •  •  • 

p 

Zt  =  Aj  cos(tUjt  +  4) j )  4-  Ct  =  Xt  +  Ct  (1.3) 

i  1 

where  p  is  not  necessarily  known,  Ai.-  .A,,,  are  unknown  constants, 
a>i,  -.lUp  are  unknown  frequencies  with  values  in  (-71,71],  {c,,]  is  white 
noise  with  mean  0  and  variance  of,  and  (hi ,  ■  •  ,(hp/  are  independent  ran¬ 
dom  variables  uniformly  distributed  in  (-tt.ti],  and  are  independent  of  {Ci}. 
The  assumption  of  white  noise  is  not  really  needed,  but  U  simplifies  the 
exposition.  In  fact,  any  continuous  spectrum  noise  will  do  just  as  well. 

The  problem  is  to  estimate  ay ,  ,  tOp  from  recursive  HCXT  sequences 

of  the  form  (1.2). 

For  this  goal,  we  investigate  the  HOC  sequences  pi  (0j )  from  two  para¬ 
metric  families  of  filters,  loosely  referred  to  as  the  "alpha  filter"  and  the 
"complex  filter."  We  also  discuss  briefly  a  third  parametric  filter  to  which 
we  refer  as  the  "exponential  filter." 
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2.2.  The  complex  filter 

The  complex  filter  is  defined  by  the  transformation, 

Z,(a;M)  =  (l  +  Z, 

where  M  is  a  positive  integer,  a  €  (-1.1),  and  t9(a)  <E  (  tt.ti).  Think  of 
M  as  being  sufficiently  large  (e.g.,  M  =  20)  so  that  we  can  entertain  the 
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approximation  0(a)  =s  cos“’(a).  This  will  be  made  more  palatable  in  a 
moment.  Clearly, 


Z,(a;M)  =  Y_ 

n-  0 

and  the  impulse  response  is, 

=  rr  =  0 . M 

[  0,  otherwise 

The  corresponding  squared  gain  is 

a,M)P  =  4'^  cos^^'  j 

for  0,tu  €  (-n.Tii,  and  a  €  (  —  1,1).  Mimicking  the  fundamental  property 
(2.2)  for  the  alpha  filter,  we  define  a.  by  the  equation, 

J^^cos^'^  f cos(A)dA 

a= — ^ - -  (2.3) 

j:,cW^(^4^)dA 

As  such,  a  is  the  real  part  of  the  first  order  autocorrelation  of  the  filtered 
white  noise,  which  is  not  white  anymore.  As  M  -»  oo. 


J"„cos^'^  1 

f  K- 

\ 

0  (  a  t  ^ 

^  J 

1  cos(A)dA 

J2„C0S^^^  1 

-) 

1  dA 

cos(0(a)) 


(2.4) 


so  that  the  approximation  0(a)  s:  cos“'(«)  makes  sense.  In  fact,  already 
with  M  =  20,  the  approximation  is  excellent.  We  think  of  0(a)  as  the  "center 
of  the  filter." 

From  [2]  we  obtain  the  higher  order  correlation. 


Pi(a,M) 


i)^iEfZ,(a.M)Z,,,(a.M  iJ) 
E|Z|  (a,  M)K 


L 


n  ,  [cos'^(^’-'|'“‘)  fcos'^(^^^“J)|cos(aMl 

_  +jn.cos^^(^::^)cos(A)dA 


A  • 

1  *7^' 

f  -..'I 

1  1  1  Z'  “ '  1 

1 

1  '  1  T  J 

1 

d\ 


(2.5) 
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or,  from  (2.3), 

Pi  (a,M)  = 


£P  ,  ^  [COS'M  cos(a.j  ) 


+  [^n„cos^"^l 

^A-e(a)  j 

1  dA 

oc 

LC.,  1 

COS^  1 

f  oil  *  t*  i  a  1  \ 

1 

^2 

(2.61 

Again,  from  (2.6),  as  with  the  alpha  filter,  pi  (a,  M)  is  a  weighted  average  of 
cos(tui ),  •  •  •  ,cos(cup ),  and  a— a  crucial  observation  that  helps  in  recovering 
all  the  coj. 

We  think  of  pi  (a,  M)  as  a  function  of  a  with  parameter  M. 


3.  The  HK  algorithm 


The  HK  algorithm  is  best  described  in  terms  of  the  alpha  filter  for  the  case 
P  =  1.  So,  let  p  =  I,  and  consider  the  alpha  filter.  It  is  convenient  to  define 
the  weight  function, 


_  Var(Ci(o!))  ^ _ 

“  Var(Z,(a))  “ 

1  — 2ac<>s(a’i  •  4  a*- 


Clearly,  0  <  C(a)  <  1,and  we  obtain. 

Pi  (a)  =  (I  -  C(a))cos(tui )  +  C(«)a  (3.2) 

This  weighted  average  implies  that  pi  (a)  must  be  between  a  and  cos(u.'i ). 
Choose  any  <xo  fc  (-1,1)  and,  without  loss  of  generality,  suppose  etc  < 
cos(aii ).  Then, 


ao  <  Pi  («o)  <  cos(uJi ) 
Define  the  recursion  [2] 

Otki  1  =  Pi  («k) 

It  follows  from  (3.2)  that. 


ao  $  ai  ^  a2  $  ■  C  cos((ui ) 

The  sequence  (akl  is  monotone  and  bounded,  and  hence  it  converges  to  a*, 
say.  However,  from  (3.3) 

Pi(a*)  =  a'  =cos(aii) 
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rigure3.1:  An  attracting  fixed  point  a »  =  cos(0.8)of  pi(a)  from  the 
alpha  filter,  p  =  1,  um  =  0.8,  Ai  =  \/2,  =  1. 


Thus,  cos(aii)  is  obtained  as  a  fixed  point  of  the  mapping  pi(aj,  and  um  = 
cos“'(a'). 

Figure  3.1  shows  the  fixed  point  in  Pi  (a)  for  toi  =0.8,Ai  =  =  1. 

It  has  been  observed  in  [2]  and  [5]  that  any  filter  that  satisfies  the 
fundamental  property  (2.2),  gives  rise  to  the  same  algorithm,  and  the  same 
solution,  as  above.  In  many  cases  all  that  is  needed  is  a  reparametrization 
that  guarantees  (2.2). 

An  extension  of  the  HK  algorithm  to  the  general  multiple  frequency 
case  is  possible,  but  for  this  we  need  different  families  of  filters,  as  done 
in  the  next  section.  The  alpha  filter,  without  any  modification,  cannot  be 
applied  in  multiple  frequency  detection.  If  the  recursion  (3.3)  is  applied  to 
Pi  (a)  in  (2.1 )  when  p  ^  2,  the  sequence  (akl  converges  to  a  point  between 
the  lowest  and  highest  cos(tui). 

4.  Multiple  frequency  estimation  using  the  complex  filter 

The  HK  algorithm  can  be  extended  to  the  multiple  frequency  case  when  the 
generated  HOC  are  obtained  from  bandpass  filters.  The  complex  filter  falls 
in  this  category  when  M  is  large  enough.  This  fact  is  illustrated  in  Figure  4.1 
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which  shows  the  graph  of  the  normalized  squared  gain, 

'  CL)  —  0 ' 


cos 


2M 


—71  ^  CD  $  7r 


with  0  =  2.0,  and  M  =  40, 100. 


(cos  (u-Theta!  /2}  M=40,  100,  Theta-2 


Figure  4.1:  The  normalized  complex  filter,  with  M  =  40, 100,  acts  as 
a  bandpass  filter  passing  frequencies  in  a  neighborhood  of  t  =  2.0. 

We  can  see  that  by  increasing  M,  it  is  possible  to  discount  the  power  associ¬ 
ated  with  the  entire  spectral  support  except  for  a  small  band  in  a  neighbor¬ 
hood  of  0.  This  means  that  we  can  practically  filter  out  any  desired  band  of 
frequencies,  and  at  the  same  lime  amplify  frequencies  in  a  small  neighbor¬ 
hood  of  0,  the  "center"  of  the  filter.  Based  on  (2.4),  we  shall  assume  that  Mi' 
is  sufficiently  large  so  that  the  approximation 

0(a)  -■  cos  '  (a)  (4.1 ) 

holds  for  all  M  >  Mp. 

Choose  ao  e  (1,1)  closer  to  coslci'i )  than  to  any  other  cos(cCj ),  i  -2  ). 
Assume  that  ao  f  ( - 1 , 1 )  is  sufficiently  close  to  coslcm )  so  that 


«o  <  cos(coi ) 


3X1 
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and  such  that  0(ao)  is  closest  to  tui,  and  coi  <  0(ao)-  This  is  possible  due  to 
(4.1 ).  Then  as  M  — i  oo, 


cos 


2M 


^ u.i|  1  einpi ^  -)-cos^^ 


cos 


2M 


^  gi  I  — 9(go  1  ^ 


^  0.  i  7^  1 


and 


cos 


2M 


^  cui  )  91ao  I  ^ 


COS 


2M 


^  gi|  -9(ao  1  ^ 


Therefore,  by  dividing  both  the  numerator  and  denominator  in  (2.6)  by 
cos'^'^  ^  one  obtains,  for  sufficiently  large  M, 


Pi(ao,M)  = 


COS  ( CU 1  )  +  terms  th<it  gt>  to  0  -h 


[4  1 

■'i  ..  2M  /  '■  V'  ’  \  I 


1  +  torm*»  tl)dl  go  tt»  0  +• 


-''-4cos.'''(-U-';.2C-) 


Evidently,  as  M  increases,  piloto.M)  becomes  arbitrarily  close  to  being  a 
weighted  average  of  cos(cei  I  and  ao.  Thus,  because  ao  ''  cos(u'i ),  we  can 
find  a  sufficiently  large  Mo  so  that 


ao  <  P  i  ( ao ,  Mo  )  <  cos  (ceil. 

Now  let  a  I  -  pt  (oko.  Mo)  and  repeat  the  argument  for  Mt  ■  Mo  Then, 
ao  <  ai  <  Pi  (ai ,  M I )  <  cos(a'i ). 

In  general,  define 

<xv.,\  -  Pi(a),,Mi,). 

corresponding  to  the  sequence  Mo  --  M|  Mo  .  to  obtain  a  sequence 

ao  <  ai  <  aj  <  ■  <  cosIum  ). 


It  follows  that  as  i 


>  oo. 
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On  the  other  hand,  for  Mk  large  enough,  we  can  find  small  e,  ei ,  62,  such 
that  coi  +  6  <  0(ak), and  0(ak)  e  (toi  -  ei,uj|  +  62), so  that 
Pi(ak,Mk)  ^ 


•  •  ■  +  ^  cos^'^^  cos(cu, ) 

dcu 

'  ^  ^  ^  sz', ' :  r  >-  ( )  ac^ 


It  is  now  clear  that  there  are  ni  (k),  hrlk)  that  go  to  zero  with  k,  such  that 


“k  ( 1  =  Pi  (cxk,  Mk !  5 


COS(  tUl  )  4"  terms  that  go  to  0  4  OCkh  1  (I'") 
1  4  terms  that  go  to  0  r|  2  ( 1^ ) 


It  follows  that  as  k  — >  oo. 


ak  a*  5  cos(cui ) 

and  we  have  proved  the  convergence  of  ak  to  coslu'i ).  The  same  argument 
can  be  used  when  txo  >  cos(cut ).  Thus  a’  =  cos(a>i )  becomes  a  fixed  point 
of  Pi  (a,  M),  as  M  — »  oo.  We  have  proved  the  following  fact. 

Theorem  4.1.  Suppose  {Z,]  is  the  harmonic  process  (1.3),  and  suppose 
t  (  —  1, 1 )  is  in  a  small  neighborhood  of  cos(a>i)  that  contains  no  other 
cos(coj  I,  i  #  I.  Then  there  exists  an  increasing  sequence  (Mk!  such  that 

otk  1 1  =  Pi  (ak,  Mk)  a'  =  cos(a’i ) 

and  a*  becomes  a  fixed  point  of  pi(a,M)  as  M  increases. 


Remark  4.2.  A  somewhat  different  approach,  leading  to  a  sequence  of 
fixed  points,  is  to  obtain  for  each  fixed  M  a  fixed  point  a);^  of  pWa.M), 
from  the  iterations  Ukt  i  =  pi(ak,M),  and  then  show  that  aj^  — >  cos(a’i ), 
as  M  ->  oo.  The  validity  of  this  approach  is  demonstrated  in  Figure  4.2 
which  illustrates  the  existence  of  fixed  points  in  pi(a,  10),  p,(«,  100),  for 
p  =  2,  and  W)  ~  0.8,  lvj  =  2.1,  The  approach  adopted  above  is  that  of 
shrinking  the  effective  bandwidth  (by  increasing  M)  at  each  iteration  of 
ak  1 1  =  Pi  (ak ,  Mk  I,  and  is  in  the  spirit  of  Kedem  and  Yakowitz  [4].  It  can 
be  argued  that  the  convergence,  toward  the  cosine  of  the  frequency  to  be 
detected,  achieved  for  a  variable  bandwidth  is  much  faster. 


Remark  4.3.  It  is  easy  to  see  from  (2.6)  that  for  sufficiently  large  M,  and  when 
the  filter  is  centered  near  loi  (that  is,  the  filter  passes  an),  we  essentially  have 
the  representation. 

Pi  («,  M)  =  cosliui )  4  C(a,  M)(a  —  cos(ui) )) 
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Complex  Filter:  M=lO,  cos  (0 . 8)  ^0 . 697,  cos  <2 . 1 )  =-0 . 505 


Complex  Filter:  M^lOO,  cos  tO.Sl-^O.eO?,  cos  (2 . 1)  =-0 . 505 


Figure  4.2;  Fixed  points  in  pi(«,  10),  pi(a,  100),  from  the  complex 
filter  for  p  =  2,  tu)  =  0.8,  cu2  =2.1. 
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0  <  C(a,M)  = 


E|Ct(«)P _ 

-|H(tu,;a,M)P  +F.|(:,(a)|^ 


and  |H(cu;  ot,  M)|^  =  4^^  cos'^'‘^’((n4  -  0(a))/2).  This  is  the  general  form  used 
in  (4).  This  form  shows  clearly  that  pi(a,M)  is  essentially  a  contraction  at 
cos(cui),  and  that  the  convergence  rate  of  n  =  pi(a),,M|  depends  on 
the  contraction  factor  C(a,  M).  As  M  increases  with  each  iteration,  C(a,  M) 
decreases  (as  long  as  the  filter  passes  toi)  and  we  obtain  a  speeded  up  rate 
of  convergence. 


4.1.  Examples 

To  demonstrate  the  plan  given  in  Theorem  4.1,  we  resort  to  some  examples 
with  simulated  data.  In  practice  pi(a)  must  be  estimated,  and  we  use  the 
estimate  obtained  from  zero-crossings. 


The  choice  of  this  estimate  is  justified  because  we  deal  with  narrow  band 
(large  M)  cases  where  the  “cosine  formula"  (1.1)  holds  to  a  great  degree. 
Observe  that  in  (4.2)  the  dependence  on  M  is  suppressed  for  simplicity. 

The  zero-crossing  count  is  defined  for  real  data,  a  fact  which  is  not 
compatible  with  the  complex-valued  output  of  the  complex  filter.  To  over¬ 
come  this  technicality,  a  reasonable  modification  is  to  use  the  real  counterpart 
defined  by 

h(n;a.Ml  =  I  (n)cos(e(cx)n),  n  =  0 . M 

1 0,  otherwise 

For  long  data  records,  the  modification  is  inconsequential.  In  the  examples 
we  use  the  approximation  d(a)  =cos“'(a). 

It  is  convenient  to  introduce  the  observed  zero-crossing  rate, 

'  N  -  1 

Then  the  algorithm  becomes, 

akii=cos(yaJ  (4.3) 

where  the  parameter  M  increases  with  each  iteration. 

The  examples  below  pertain  to  N  2000  (more  precisely,  2000  plus 
the  maximum  M  used),  and  various  SNR's  (SNR:  signal  to  noise  ratio).  The 
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noise  was  generated  from  independent  >[(0,  oj)  random  variables.  In  each 
case,  we  also  report  the  performance  of  the  periodogram  for  comparison. 

The  examples  convey  the  potential  of  the  algorithm  (4.3)  for  frequency 
detection  in  ambient  noise.  From  an  applications  point  of  view,  it  is  interest¬ 
ing  to  note  that  once  a  true  discrete  frequency  is  "captured"  in  the  bandpass, 
it  is  not  lost  throughout  the  procedure  as  long  as  M  increases  judiciously. 
The  examples  indicate  that  linear  growth  is  appropriate. 


4.1.1.  Example  1;  p  =  1 


In  this  example,  a>i  =  0.91,  At  =  -0.390676,  Bi  =  -0.938528,  and  = 
1.  With  0(cxc)  =  1.21,  the  normalized  HOC  sequence  iyaj  converges  to 
0.909946.  By  comparison,  the  corre.sponding  periodogram  estimate  is  u'l  - 
0.91419. 


k 

Mk 

0 

5 

1.05453 

1 

7 

0.96652 

2 

9 

0.94766 

3 

11 

0.92566 

4 

13 

0.91938 

5 

15 

0.91623 

6 

17 

0.91309 

7 

19 

0.90366 

8 

21 

0.90995 

9 

23 

0.90995 

wi  -  0.90995 

4  1.2.  Example  2:  p  -  2,  moderate  SNR 


In  this  example  p  =  2,  wi  -  0.5,  a'2  1.2,  At  =  1.68853,  Bi  -  1  M565 

A 2  -  -1.59479,  Bi  =  0.238443,  and  cr^  =  1.  With  oto  =  0.8,  the  normalized 
HOC  sequence  [fa;.)  converges  to  0.49976.  For  9(«o)  =  2.0,  the  conver¬ 
gence  is  toward  1  b0O69.  This  is  to  be  compared  with  the  corresponding; 
periodogram  estimates  0.50325  and  1.20304. 


{  Kedem,  Lopes 


5X6  } 


cu,  =0.5,  9(a<5)  =0.8  cu^  =  1.2,  9(oco)  =  2.0 


k 

Ml, 

Vau 
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15 

0.60349 

1.69102 
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17 

0.50605 

1.31227 

2 

19 

0.50134 

1.19597 

3 

21 

0.50134 

1.18812 

4 

23 

0.50134 

1.18812 

5 

25 

0.49976 

1.20069 

6 

27 

0.49976 

1.20069 

7 

29 

0.49976 

1.20069 

8 

31 

0.49976 

1.20069 

tO,  =  0.49976 

tb;  =  1.20069 

4.1.3.  Example  3:  Detecting  a  weak  component 

Here  we  have  two  sinusoidal  components  where  the  first  one  is  relatively 
weak.  Theparametersarep  =  2  a>i  =0.97, =2.1,Ai  =-0.699457,8)  = 
-0.116106,  Ai  =  -1. 15000,  82  =  2.45159,  <j{  =  2.  With  0|ao)  =  0.85, 
the  normalized  HOC  sequence  IVa^}  converges  to  0.97281.  For  Ofac)  = 
1 .8,  the  convergence  is  toward  2.09963.  This  is  to  be  compared  with  the 
corresponding  periodogram  estimates  which  are  0.97240  and  2.13928. 


cu,  =0.97,  e(oio)  =0.85 

cu;  =2.1,9(«o)  =  1.8 

k 

Mk 
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20 

0.92566 

2.09492 
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22 

0.96652 

2.10121 

2 

24 

0.97281 

2.10121 

3 

26 

0.96%7 

2.10121 

4 

28 

0.97281 

2.09%3 

5 

30 

0.97595 

2.09963 

6 

32 

0.97281 

2.09963 

7 

34 

0.97281 

2.09%3 

cu,  =  0.97281 

cb;  =  2.09963 

5.  A  brief  look  at  the  complex  exponential  filter 

Clearly,  the  preceding  development  can  be  repeated  with  numerous 
families  of  bandpass  filters.  A  particularly  good  choice,  in  terms  of  the  rate  of 
convergence  toward  the  fixed  points,  is  the  parametric  complex  exponential 
filter  defined  by  the  impulse  response, 

hln-  a  M)  =  ■^  (in0(a))/v^M  -TT,  In  ^  M 

I  0,  |n|  >  M 
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corresponding  to  the  squared  gain. 


|H(cu;a,M)r 


1  sin^[^(2M+ l)(a)- 9(a))] 

2M  +  1  sin^lj(u)  -  dla))l 


,  —-X  ^  LU  ^  71 


where  M  is  a  positive  integer.  This  paranaetric  filter  was  considered  in  [4]. 
The  fundamental  property  is  manifested  through  the  relationship  [4], 


“-”1 - - 1 

It  holds  exactly  for 

when  la]  ^  2M/(2M  +  1).  Again,  as  M  increases,  0(a)  =  cos”’ (a),  and 
a  €  (  —  1,1).  The  corresponding  ay,  sequence  converges  very  fast,  provided 
ac  is  sufficiently  close  to  coslcuj )  for  some  i.  This  is  illustrated  graphically 
in  Figure  5.1.  The  figure  shows  the  existence  of  fixed  points  aj,.,  in 


Pi(a,M) 


iH{E[Zt(a.M)Zt-,(a.M)i; 

E|Zt(a,M)P 


for  M  =  10,20,  and  cui  =  0.8,  a>2  =2.1,  the  case  considered  earlier.  Near 
cos(0.8),  cos(2.1 ),  the  derivative  of  pi  (a,  M),  M  =  10,20,  is  essentially  0  and 
the  convergence  of  ai;  =  pi  (av.-t ,  M)  toward  aj^  is  very  fast.  Already  with 
M  =  20,  it  is  difficult  to  see  a  significant  difference  between  a^o  and  the 
cosine  of  the  frequency  of  interest. 

As  expected,  a  much  faster  rate  of  convergence  is  provided  by  the 
iterates  of  pi(a,M)  for  fixed  M,  and  the  "basin  of  convergence"  becomes 
more  pronounced.  This  can  be  seen  very  clearly  in  Figure  5.2  which  shows 
the  graph  of  10  iterates  pj*’  =  pi  o  =  Pi  o  pi  •  o  pi .  This  means  that  the 
procedure  of  increasing  M  with  each  iteration  leads  to  a  speeded  up  rate  of 
convergence  of  the  HK  algorithm,  a  fact  highlighted  already  in  [4). 


5.  Example  4:  use  of  the  exponential  filter 

To  illustrate  the  performance  of  the  HK  algorithm  using  the  complex  expo¬ 
nential  filter,  we  resort  again  to  HOC  from  zero  -crossings  obtained  from  the 
real  counterpart  of  h(n;  a,  M ), 

I  cos(m0(a))/v/2MTT.  ln|  $  M 
1  0,  |n|  >  M 

The  following  table  gives  the  observed  zero-crossing  rate  You  obtained  from 
the  recursion  ak  1 1  =  cos(Y«k )/  using  the  exact  same  data  as  in  Example  3 
above.  The  results  in  both  cases  are  quite  comparable. 
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Exp.  Filter;  H“20,  cos  (0. 8) ‘^0.  697,  cos  (2.1)  =-0.505 


Figure  5.2:  The  graph  of  10  iterates  pt  o  pi  p)  of  pi(a,10), 
Pi  (a,  20),  from  the  complex  exponential  filter  for  p  ^  2,  (0|  =  0.8, 
0J2  =  2,1, 
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tui  =  0.97,  0(oto)  =  0.85 

a>z  =  2.1,  0(ao)  =  1 .8 

k 

Mk 

■Vat 
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40 
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50 
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2.10121 

5 

60 

0.96%7 

2.10121 

cbi  =  0.96967 

ci).  =2.10121 

6.  Summary 

The  HK  algorithm  for  multiple  frequency  detection  in  noise  has  been  de¬ 
scribed  in  terms  of  some  parametric  families  of  filters.  It  has  been  demon¬ 
strated  with  the  help  cf  artificial  data  that  the  algorithm,  when  following  the 
particular  form 

Otun  =  COSlYa^  ) 

using  (real)  bandpass  filters  to  generate  HOC  from  zero-crossings,  performs 
quite  remarkably.  The  examples  indicate  that  only  a  few  iterations  are  needed 
in  order  to  achieve  a  very  satisfactory  precision.  Equally  good  results  were 
obtained  using  a  variety  of  other  bandpass  filters,  such  as  the  sine  Butter- 
worth  bandpass  filter  constructed  from  lowpass  and  highpass  filters.  Under 
some  assumptions,  including  the  assumption  that  the  cosine  formula  (1.1) 
holds,  we  can  show  that  as  N  — >  oo,  the  sequence  converges  with  proba¬ 
bility  one  to  the  cosine  of  the  desired  frequency.  The  two  main  ingredients 
needed  in  order  to  establish  this,  are  a  well  behaved  (close  to  zero)  deriva¬ 
tive  of  Pi  (a,  M )  near  the  cosine  of  the  frequency  of  interest,  and  the  fact  that 
the  asymptotic  zero-crossing  rate  lies  between  the  highest  and  lowest  non¬ 
negative  frequencies  in  the  spectral  support  (3].  This  matter  will  be  taken 
up  elsewhere. 
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a  Hidden  perkniicities  may  be  metre  easily  detected  if  sampling  takes  place 
at  random  times 


1.  Introduction 

In  this  paper  we  consider  the  following  natural  problem.  Suppose  f :  01  >  iT 

is  a  function  of  the  form 

M 

f(xl  -  ^  a,„  exp  (c\,„x|  ( 1 .1 1 

iti  I 

with  a,,,  e  (T  and  t  fH  [1  p  mi;  M],  but  that  we  do  not  know  the  values 
of  the  a,n  and  A,„  or  even  of  M.  The  result  of  making  an  observation  at  a 
point  x„  e  01  is  given  by 

fix,,)  --  f|x„)  t  c„  |1.2i 

where  e„  is  some  unknown  random  error.  We  are  allowed  to  make  observa¬ 
tions  at  X| ,  X2 . xn  and  wish  to  estimate  the  a„,  and  A,„  from  the  resulting 

observations  f(xi ),  f(x2) . f(xN  1 

As  it  stands  the  question  is  not  fully  defined.  We  make  the  following 
remarks. 

1 )  Unless  the  error  is  zero  it  is  unreasonable  to  expect  to  recover  those 
A,n  for  which  the  associated  Q,n  are  very  small.  Even  if  there  is  no 
error  we  would  expect  a  method  which  claimed  to  recover  K,n  for  all 
nonzero  Um  to  be  numerically  unstable. 
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2)  It  is  unreasonable  to  expect  to  recover  those  A,,,  outside  a  previously 
specified  set  A  composed  of  a  finite  number  of  intervals.  Dirichlet's 
theorem  (see,  e.g.,  [2],  pp.  177-178)  tells  us  that,  given  any  function  f  of 
the  form 

•M 

f(x)  =  Yi 

ni  -  ) 

with  Q,n  €  (T  and  Am  €  tH  [1  $  m  $  M],  together  with  xj ,  X2 . xn  <E 

tH,  and  any  e  >  0  and  K  >  0,  we  can  find  A'„  c  91  such  that 

lA'm  -  Ami  >  K 


but 

M  M  ! 

Y  QmexpIlA'.nXn)  -  ^  Qm  exp  (tA.n  X„  )  j  <  C 

in  1  ml  i 

for  all  1  s;  n,  ^  N. 

3)  Next  we  observe  that  if  f  is  given  by 

f(t',  =exp(vAt\  -exp(\(A 

then  lf(t)l  <  e  for  all  jt]  <  Ir)!'  'c/2.  It  is  thus  unreasonable  to  seek  a 
method  which  does  not  require  previous  knowledge  of 

A'  —  min  A,  -  A^'. 

r  /  •» 

4)  We  also  need  to  know  something  about  the  errors  c„.  In  what  follows 
we  shall  assume  that  the  c„  are  independent  identically  distributed 
random  variables  with  mean  £cn  =  0  and  variance  £ef,  =  u^.  We  shall 
further  assume  that  the  distribution  of  c„  is  Gaussian  (since  we  work 
in  ff  this  means  that  argc^  is  uniformly  distributed  on  [0,2tt)  and  Re 
e„  is  a  real  Gaussian  random  variable)  but  our  arguments  apply,  with 
hardly  any  change,  whenever  the  distribution  of  c„  is  reasonably  well 
behaved.  More  complicated  models  of  errors  exist  in  which  the  e,, 
are  not  independent.  I  believe  that  the  method  proposed  will  produce 
similar  results  fur  some  of  these  as  well. 

5)  We  must  have  some  idea,  not  only  of  the  cost  of  single  computation,  but 
of  the  cost  of  making  a  single  observation.  The  relative  cost  may  lead  us 
to  prefer  a  method  requiring  many  computations  but  few  observations 
or  vice  versa.  The  absolute  cost  may  make  some  methods  impractical. 

So  far  the  considerations  we  have  raised  apply  to  all  methods. 
Our  discussion  will  make  two  specific  assumptions  which  will  not 
always  be  satisfied  in  practice 
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6)  We  assume  that  the  choice  of  xi ,  X2 ..... xn  precedes  any  of  the  obser¬ 
vations.  In  other  words  our  method  is  not  adaptive  (though,  of  course, 
it  could  be  incorporated  into  an  adaptive  scheme). 

7)  Finally  we  assume  that  the  xt.X2.....XN  may  be  chosen  freely  from 
the  reals. 

In  order  to  illustrate  the  remarks  above  and  to  provide  a  comparison 
with  the  methods  we  shall  propose,  consider  the  following  standard  proce¬ 
dure.  Suppose  that  f  is  as  in  equation  (1.1)  and  we  know  that  the  A,n  all  lie 
well  within  A  =  f— a.  tx).  Choose  T  >  Oand  set 

x„  -  -T  -(-2nT/N. 

We  now  obfain  the  observations  f  given  by  equation  (1.2)  and  compute  the 
Fourier  transform 


N 

F(  -  a  -1-  2ra/N )  =  N” '  ^  f(x„  )exp(— i(-a  +  2ra/N)x„  ) 

n  1 

for  r  =  1,2 . N,  using  the  fast  Fourier  transform.  Provided  that 

N  is  reasonably  large  (specifically,  N’  ■^/logN  is  large  compared  with 
a/ mini-,  m  iQm-)  it  is  clear  that  F(-a  2ra/N)  will  not  differ  apprecia¬ 
bly  from 


N 

F(  -«  +  2ra;/N)  =  N  - '  ^  f(x„  )exp(-  i(-«  2ra/N)x„ ). 

n  ) 

In  particular,  provided  that  A*  T  is  large,  the  graph  of  F  will  exhibit  M  typical 
regions  of  disturbance  centered  on  the  A,n  of  maximum  amplitude  |a,„l  and 
typical  width  of  the  order  T  ' Thus,  provided  we  take  N  roughly  of  the 
same  size  as  T«,  we  can  locate  the  A,n  to  a  precision  of  about  T“'.  This 
method  thus  requires  of  the  order  of  oc6~’  observations  and  a6“ '  log(a6“' ) 
computations  to  locate  the  frequences  A,„  to  within  6. 

In  the  next  section  we  give  an  informal  description  of  our  proposed 
method  along  the  same  lines.  Section  .1  compares  our  method  with  the  one 
just  described.  The  final  section  contains  further  extensions  of  our  method. 
In  particular  we  show  there  that  (provided  we  are  prepared  to  accept  a  certain 
probability  of  error,  which  may  be  as  small  as  we  please)  we  can  locate  the 
frequences  A,n  to  within  6  using  only  of  the  order  of  '  "'g6~’  log  log  6^' 
observations  and  log  6“'  log  log  6“’  computations. 

While  the  full  strength  of  this  last  result  is  unlikely  to  be  attainable  in 
practice,  the  underlying  method  is  easy  to  implement  and  does  give  substan¬ 
tial  savings  (at  least  compared  to  the  standard  procedure  outlined  above). 
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2.  Description 

Choose  T  >  0  and  a  reasonably  large  integer  N  (for  simple  problems  N  =  400 
might  be  sufficient).  Choose  xi,X2,...,xn  at  random  in  the  interval  f-T,Tl, 
More  formally  set  Xn  =  Xn  where  Xi,X2,...,Xn  are  independent,  iden¬ 
tically  distributed  random  variables  each  uniformly  distributed  on  f— T,  T). 
(Of  course,  the  interval  (— T,  T]  may  be  replaced  by  any  other  interval  of 
length  2T.) 

•  Now  suppose  that  f  is  as  in  equation  (1.1)  and  we  have  the  observations 
f  given  by  equation  ( 1 .2).  We  define  an  approximate  Fourier  transform  F(A  1 
for  a  frequency  A  by 

N 

F(A)  =  N-'  f(X„)exp(-iAX.,). 

n  1 

Observe  that  (keeping  A  fixed)  F(A)  is  a  random  variable  given  by 
N 

F(A)  =  N-' 

n  1 

where  the 

Yn  =  (f(Xn)  +  e„)exp(-iAX„) 

are  themselves  independent,  identically  distributed  random  variables  taking 
values  in  (T.  We  note  that 

CY„  =  £(f(X„)exp(-iAX„))  =  F(A) 

where 

■1 

f(t)exp(-iAt)dt 

-1 

and  that 

=  £(lt(X„  )|^  +  e;f(X., )  +  c„f(X„  )*  +  |c„|^) 

=  C|f(X„)|^  +£le.,|^. 

If  A*T  is  sufficiently  large  the  cross  terms  in  the  evaluation  of  £|f(X„|^  will 
be  negligible,  and  so  setting  =  var(Y„),  we  will  have 

M 

^  ia,nl^  -F  -Fq 

m  1 

where  q  is  negligible. 


(2.1) 
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Applying  the  central  limit  theorem  we  see  that 

N 

N’''^(F(A)  -F(M)  =N-''2  ^(Y„-£(Yn)) 

n  1 

is  approximately  Gaussian  with  mean  0  and  variance  In  other  words 

F(M  =  F(X) +  E(A')  (2.2| 

where  E(A)  is  approximately  normal  with  mean  zero  and  variance 
1  think  of  E(X)  as  noise  which  is  partly  natural,  coming  from  the  errors  e„, 
and  partly  artificially  induced  by  the  random  choice  of  the  sample  points.  In 
the  same  way  1  interpret  the  inequality  (2.1)  by  saying  that  the  variance 
has  a  natural  component  cr^  and  an  artificially  induced  component 
Because  of  the  nature  of  the  Gaussian  distribution  we  would  not  be  surprised 
to  find  |E(A)|  of  size  about  or  even  about  2tN~’  but  it  would  be 

extremely  surprising  to  find  |E(A)|  of  size  or  greater. 

With  this  in  mind  let  us  fix  a  K  ^  3.  The  probability  that  |Z|  ^  KtN^’  ^ 
is  p(K)  with  p(K)  approximately  given  by 


2_ 

v'In 


'3C 

exp(-x^/2)dx. 

K 


Since 


F(A|  = 


f(t|exp(-iAt!  dt 


M 


=  L 

m  -  1 


Q  tn 


sin((A,„  -  A)T) 
~(A,..  -  A)T 


(2.3) 


we  see  that,  if  |A„,  -  A|  is  reasonably  large  compared  with  T  ‘  ’  for  all  m,  then 
1F(A)|  will  be  smaller  than  (K  +  I  )tN“'  with  probability  at  least  I  -  p(Kl. 
On  the  other  hand,  provided  that  A‘T  is  reasonably  large,  if  lorl  ^  3kTN  ’  ^ 
then  for  A  close  to  Ar  the  term 


sin((Ar  -  A)F) 

'  (A,  -A)T~ 

will  dominate  in  the  expansion 


M 

F(A)=  X. 

tn  I 


Qin 


sin((An,  -  A)T) 

n^-A)T 


+  E(A) 


with  probability  at  least  1  -  p(K). 
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So  far  we  have  looked  at  F(X)  for  a  single  value  of  X.  Now  suppose 
that  we  are  given  A,  the  union  of  a  finite  number  of  disjoint  intervals  of  total 
length  |A|.  Provided  R  is  reasonably  large  compared  with  the  number  of 
intervals  making  up  A,  we  can  find  R  points  -vi  ,V2, . . .  ,vr  e  A  such  that  if 
X  6  A  then  |X  -  Vrl  <  |A|/R  for  some  r,  1  $  r  $  R.  It  is,  of  course,  not  true 
that  the  'errors'  E(vr)  are  mutually  independent  but  the  simplest  possible 
estimate  shows  that 

|E(v,)|  ^  for  all  1  $  r  ^  R  (2.4) 

with  probability  at  least  1  -  Rp(K).  If  equation  (2.4)  holds  then  the  same 
kind  of  considerations  as  applied  in  the  previous  paragraph  and  in  the 
penultimate  paragraph  of  the  introduction  will  apply,  provided  we  have 
|A|R~'  of  size  about  (61)“’  or  smaller.  If  loml  ^  and  X,n  is  well 

within  A  we  shall  see  a  typical  region  of  disturbance  centered  on  X^  of 
amplitude  roughly  |a,nl  ±  and  width  of  the  order  T“'  standing  out 

from  the  surrounding  noise.  We  may  or  may  not  detect  regions  associated 
with  smaller  loml  but,  provided  we  ignore  all  Vr  with  |F(vr)|  ^  (K+ 1  )tN“’ 
we  shall  not  obtain  any  'false  positives'. 

A  well-known  estimate  gives 

2  roo  2 

-j=  exp(-xV2)dx  ^  -;  7^  xexp(-x^/2)dx 

VZTtJk  KVZTtJK 

'da”'’'-"'"'' 

Thus  the  procedure  of  the  last  paragraph  will  locate  all  the  Xm  in  the  search 
region  for  which  |a,nl  $  a'  to  a  precision  of  about  6  (and  not  produce  false 
positives)  with  a  probability  of  failure  less  than  e  provided  the  following 
relations  hold.  (Here  u  ^  v  is  to  be  read  as  u  ^  Cv  for  some  numerical 
constant  C.) 

6  V  T-' 

T“’  V  |A|R-' 
a*  ^ 

e  ^  RK“'  exp(-KV2). 

Thus  we  need 

eN'/'^a*T-'exp(N(a*T-')V2)  b  R  b  IA|6“'  (2.5) 

for  our  method  to  perform  as  desired.  The  inequality  (2.5)  tells  us  how  we 
must  vary  N,  the  number  of  sample  points.  Since  the  sample  points  are  not 
in  arithmetical  progression  we  can  not  use  the  fast  Fourier  transform,  and 
our  method  will  require  of  the  order  of  RN  computations. 
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3.  Discussion 


The  reader  who  is  worried  by  the  probability  e  of  error  should  observe  that, 
if  we  fix  the  other  parameters,  and  seek  to  reduce  e  by  increasing  Nl  then  Nl 
need  only  increase  slower  than  log(e“'  )•  Thus  an  essentially  trivial  increase 
in  the  number  of  sample  points  and  computations  will  reduce  the  probability 
of  failure  inherent  in  the  method  below,  e.g.,  the  probability  of  some  serious 
undetected  computer  error.  It  should  also  be  noted  that  if  a  >  0  then  any 
method  must  inevitably  have  a  strictly  positive  probability  of  error. 

Let  us  now  compare  our  method  with  that  discussed  at  the  end  of 
the  introduction.  Let  us  fix  e  and  a*  (for  example  we  might  take  a*  = 
naini^m$M  lumD-  In  the  introduction  we  took  A  =  [-a,  a]  and  saw  that 
the  standard  method  required  of  the  order  of  |A|6”’  sample  points  and 
|A|6“’ log(|A|6~' )  computations  to  locate  the  frequencies  A,n  to  within  6. 
Our  method,  which  allows  A  to  be  the  finite  union  of  intervals,  requires, 
at  most,  of  the  order  of  log(|A|6“')  sample  points  and  of  the  order  of 
|A|6~’ log(|A16~' )  computations.  We  note,  for  later  use,  that  the  choice 
of  A  can  be  made  after  the  observations. 

There  are  various  remarks  we  should  make  at  this  stage. 


1)  Although  we  have  achieved  a  substantial  reduction  in  the  number 
of  sample  points  required,  the  number  of  computations  has  not  been 
reduced.  In  Section  4  we  shall  see  that  this  weakness  can,  to  a  large 
extent,  be  overcome. 

2)  If  we  keep  q  i ,  02 , . . . ,  cm  fixed  but  allow  the  Am  to  vary,  it  is  not  hard 
to  see  that  the  accuracy  6  and  the  power  of  discrimination  A*  vary  in 
step  with  each  other.  Thus  our  method  requires,  at  most,  of  the  order 
of  log(|A|A*“' )  sample  points  and  of  the  order  of  |A|A’“’  ]og(|A|A'“’ ) 
computations.  Although  we  have  chosen  the  random  variables  Xn 
to  have  uniform  probability  distribution  on  f-T,  T],  smoothing  them 
might  well  produce  better  discrimination  in  practice.  For  example, 
if  we  take  the  to  be  independent,  identically  distributed  random 
variables  each  with  density  function 


g(^)  = /t-’|T-x|.  foralllT-xUT 
\  0,  otherwise 

and  proceed  as  before,  our  method  and  the  supporting  argument  are 
essentially  unchanged  except  that  equation  (2.3)  becomes 


M 

F(A)=  ^ 

m-  1 


(sin((Am  -A)T/2))^ 
((Am-A)T/2)2 


We  have  thus  localised  the  disturbance  associated  with  the  frequency 
rather  better  than  before. 
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3)  The  'detection/noise'  ratio 

.  _  _ _ _ 

only  decreases  as  fast  as  Thus  to  reduce  p*  by  a  factor  of  L 

whilst  keeping  everything  else  unchanged  requires  us  to  multiply  the 
number  of  sample  points,  and  so,  also,  the  number  of  computations  by 
a  factor  of  order  L^. 

At  first  sight  this  does  not  seem  very  good,  but  a  simple  example 
shows  that  the  rates  of  growth  in  the  last  sentence  can  not  be  improved. 
Suppose  that  we  simplify  our  problem  so  that  equation  (1.1)  becomes 

f(x)  =  a 

with  Q  G  ff.  Then  equation  (1.2)  gives 

f(Xn  )  =  a  +  Cn 

and  our  problem  reduces  to  finding  whether  a  =  0  or  not.  Simple 
statistical  considerations  show  that,  in  order  to  be  reasonably  confident 
of  detecting  a  ^0  when  |a|  ^  a*  and  not  declaring  a  ^  0  when,  in  fact, 
a  =  0,  we  need  a  'detection /noise'  ratio 


rather  larger  than  1.  Since  we  must  read  the  values  of  f(xn  )  in  order 
to  use  them  we  must  make  at  least  N  computations.  Just  as  before,  to 
reduce  p**  by  a  factor  of  L  whilst  keeping  everything  else  unchanged 
requires  us  to  multiply  the  number  of  sample  points,  and  so,  also,  the 
number  of  computations  by  a  factor  of  order  L^.  It  is,  of  course,  true 
that  p**  only  involves  the  'natural  noise  level'  a  whilst  p’  involves  a 
noise  level  with  an 'artificial  component' of  order  In 

Section  4  we  shall  sketch  a  way  of  getting  round  this. 

4)  We  turn  now  to  the  estimation  of  a,„.  The  obvious  way  of  doing 
this  is  to  guess  a,n  =  F(vj)  where  ,  is  the  point  at  which  |F(Vk  )|  is 
largest  within  the  disturbance  associated  with  A,n.  Other  schemes  are 
possible,  but  the  reader  should  remember  that  the  random  errors  E(vi^ ) 
of  equation  (2.2)  are  are  not  independent.  With  our  simple  scheme 
there  are  two  sources  of  error.  The  first  source  is  the  distance  of  Vj 
from  A,n.  This  error  will  be  of  order  and,  provided  6  has  already 
been  chosen  quite  small,  will  not  be  important.  (In  any  case,  once  we 
have  located  Am  as  lying  near  we  can  always  calculate  FCv)  for  a 
group  of  closely  spaced  points  near  -vi^  without  adding  noticeably  to 
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the  computational  load.  We  shall  develop  this  idea  substantially  in 
Remark  4.2  of  the  next  section.) 

We  are  thus  only  worried  by  the  second  source  of  error,  to  wit 
the  noise  E(A).  Almost  exactly  the  same  considerations  as  applied  to 
the  problem  of  detection  in  note  3  show  that  this  error  will  be  of  the 
order  of  N  t  for  our  method  and  that  any  method  whatsoever  must 
have  errors  in  the  estimation  of  Q,n  of  order  at  least  The  last 

sentence  of  3  thus  applies  with  p*  and  p**  replaced  by 


error  level 

lSjl/2.^ 


and  e*’ 


error  level 


4.  Multistage  searching 

Suppose  as  usual  that  f  is  as  in  equation  (1.1)  and  that  we  have  chosen  e 
and  Q*  once  and  for  all.  Suppose  further  that  we  know  a  6  >  0  such  that 
A*  =miniss<rsM  |Ar -Ad  is  substantially  larger  than  6.  Let  us  say  that  Aj  is 
an  important  frequencyif  lojl  ^  a*.  We  also  fix  e  >  0  which  will  correspond 
to  a  desired  upper  bound  for  the  probability  of  error  and  a  length  a  >  0. 
For  the  purposes  of  this  section,  the  information  we  shall  require  about  the 
method  discussed  in  the  previous  sections  can  be  summarised  as  follows. 

Summary.  There  exists  a  constant  P  >  0  and  numbers  N,  L  >  1  depending 
on  a*,  e  and  6  with  the  following  property.  Suppose  6  ^  n'  >  0  and  that 
k  ^  2  is  an  integer.  Suppose  further  that  T'r)'  =  P  and  that  N'  =  [N  log  kl 
(i.e.,  N'  is  the  integer  part  of  Nlogk.  Let  us  take  a  set  F'  of  N'  points  at 
random  on  f— T',T'l.  Suppose  we  are  now  given  A'  a  union  of  intervals 
each  of  length  greater  than  n',  and  such  that  their  total  length  |Ai  satisfies 
|Alri“’  =  2a6“'.  Then  the  method  of  the  previous  sections  will  locate  all 
important  frequencies  lying  in  A'  (together,  possibly,  with  other  frequencies 
but  without  false  positives)  to  within  q'  with  probability  at  least  1  -  e/k.  The 
number  of  computations  required  will  be  no  greater  than  L  log  k. 

In  the  theoretical  discussion  that  follows  k  will  be  a  large  integer;  in 
practice  there  are  useful  gains  even  with  k  =  2  or  3.  To  check  the  summary  it 
suffices  to  look  at  the  inequalities  preceding  inequality  (2.5)  and  recall  that 
the  'noise  level' t  is  a  constant  of  the  system. 

Now  suppose  we  are  given  A  =  [-a,  a)  (or,  more  generally,  a  reason¬ 
able  set  of  intervals  of  total  length  2a).  We  suppose  further  that  M6/(2a) 
is  small  compared  with  1.  Let  us  set  A(0)  =  A,  q(l)  =  6,  and  define  q(j) 
inductively  for  each  2  $  j  ^  k  by 


Mq(j  -  l)n(j)  '  =2a6 
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Let  T(j)  be  chosen  so  that  T(j)Ti{j)  =  P  and  chose  a  set  E())  of  N'  =  [N1  log  k] 
points  at  random  on  [-T{j),T(j)]. 

Referring  to  the  summary  we  see  that  we  can  use  the  method  of  the  pre¬ 
vious  sections  applied  to  E(1 )  to  locate  the,  at  most,  M  important  frequencies 
in  a  set  A(l)  consisting  of,  at  most,  M  intervals  each  of  length  2r)(l).  Now, 
using  this  knowledge,  we  may  apply  the  method  to  E(2)  to  locate  the,  at 
most,  M  important  frequencies  in  a  subset  A(2)  of  A(  1 )  consisting  of,  at 
most,  M  intervals  each  of  length  2ti{2). 

Continuing  through  k  stages  we  arrive  at  a  set  A(k)  consisting  of  at 
most  M  intervals  of  length  2T|(k).  Provided  that  nothing  has  gone  wrong, 
each  interval  contains  exactly  1  frequency  A(s)  and  each  important  frequency 
lies  in  an  interval.  There  is,  of  course,  a  positive  probability  of  failure  at  each 
step,  but,  looking  at  the  summary  we  see  that  this  is  less  than  e/k.  The 
probability  that  anything  has  gone  wrong,  and  that  the  second  half  of  the 
second  sentence  of  this  paragraph  could  be  false,  is  thus  less  than  e. 

We  have  thus  used  kLlog  k  computations  and  klM  log  k  sample  points 
to  locate  all  important  frequencies  (and,  possibly,  some  others)  to  within 
6(M6/2a)~'‘^ '  with  a  probability  of  error  of  less  than  e.  In  other  words, 
with  probability  at  least  1  -  e  we  have  achieved  an  accuracy  of  q  in  locating 
important  frequencies,  using  only  of  the  order  of  log  q~  ’  log  log  q“  ’  compu¬ 
tations  and  of  the  order  of  logq“'  log  log  sample  points.  Provided  we 
know  min  lojl  (and  provided  that  this  minimum  is  nonzero)  we  can  choose 
a*  sufficiently  small  that  all  the  frequencies  Am  are  important  frequencies.  In 
this  way  we  have  fulfilled  the  promise  made  in  the  penultimate  paragraph 
of  Section  1 . 

We  conclude  with  a  series  of  remarks. 

Remark  4.1.  Although  we  can  locate  the  Am  to  arbitrarily  high  accuracy  by 
increasing  k,  the  power  of  discrimination  of  the  method  (i.e.,  the  smallest 
value  of  A*  =  mini§r<ssM  |Ar  —  A,!  with  which  it  is  guaranteed  to  cope)  is  of 
the  order  of  q(l )  =6.  The  number  of  operations  and  observations  required 
for  a  given  power  6  of  discrimination  remain  governed  by  the  estimates  of 
Section  3. 

Remark  4.2.  So  far  we  have  concentrated  on  obtaining  good  estimates  for 
the  frequencies  Am  and  more  or  less  ignored  the  associated  amplitudes  Om. 
However  once  the  frequencies  are  known  to  high  accuracies  the  amplitudes 
are  easily  obtained  using  the  method  of  least  squares.  In  particular,  suppose 
we  are  given  a  T  such  that  T mini$r<s$M  |Ar  -  A,!  is  very  large.  Choose 
N  points  Xi,X2,...,Xn  at  random  on  (-T.T).  Suppose  we  have  estimates 
A'^  of  Am  for  1  $  m  $  P  which  are  sufficiently  good  to  make  T|Am  -  A'^| 
very  small.  Then  similar  arguments  to  those  we  used  in  Section  2  show  that, 
with  a  probability  of  error  less  than  some  bound  fixed  in  advance,  we  can 
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estimate  the  Om  for  1  ^  m  ^  P  to  within  an  order  of  where 

M 

T*^=cr^+  ^ 
j=p  +  i 

For  the  sake  of  theoretical  simplicity  let  us  suppose  that  we  use  a 
completely  different  set  of  random  points  to  estimate  amplitudes  from  the 
set  of  points  that  we  used  to  estimate  the  frequencies.  If  we  use  the  same 
number  of  points  in  each  of  the  two  sets  we  add  hardly  anything  to  the 
number  of  computations  (and  in  particular  we  do  not  change  its  order  of 
magnitude).  The  errors  in  our  estimates  of  amplitudes  will  be  of  the  order 
of  T*  divided  by  the  square  root  of  the  total  number  of  observations. 

Remark  4.3.  The  idea  of  Remark  4.2  can  be  taken  slightly  further.  Once  we 
have  good  estimates  and  q'^  for  the  A,n  and  On,  with  1  $  ra  $  P  we 
can  apply  our  method  (possibly  using  a  different  set  of  observations)  to  g 
defined  by 


p 

g(x)  =  f(x)  -  Y_  a'mexpllA'^x). 

m-  1 

Since,  in  the  spirit  of  Section  3,  we  could  say  that  g  is  associated  with  a  lower 
artificially  induced  noise  level  (at  least,  if  we  restrict  the  range  over  which 
we  take  observation  points  appropriately)  we  should  expect  to  be  able  to 
estimate  further  frequencies  A,n  and  amplitudes  Om  for  which  loml  is  smaller 
than  before.  However  the  new  frequency  and  amplitude  estimates  will  be 
less  accurate  than  those  previously  obtained,  if  we  take  |QiI  $  Ia2i  ^  ^ 

IomI  then  this  sort  of  idea  will  only  enable  us  to  estimate  frequencies  Am  and 
amplitudes  Om  for  I  ^  m  $  Q  using  a  total  of  N  observations  on  condition 
that  IoqI^  is  reasonably  large  compared  with 

Remark  4.4,  If  we  consider  the  multidimensional  generalisation  of  our 
problem  to  a  function  f  :  ->  ff  given  by 

M 

f(x)  =  Y_  OmCXpliAm  ■  X) 

m  1 

with  Qm  €  C  and  Am  e  [  1  $  tn  ^  MJ,  then  our  methods  can  be  applied 
essentially  unchanged  and  retain  their  advantages. 

Remark  4.5.  It  would  be  in  keeping  with  the  philosophy  of  this  paper  to 
suggest  that  if  a  long  term  sequence  of  expensive  observations  is  to  be  made 
the  times  of  observation  should  be  chosen  according  to  a  Poisson  process. 
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Remark  4.6.  It  is  not  hard  to  produce  rigorous  proofs  of  the  statements 
of  this  paper  using  the  methods  of  [1]  (see  in  particular  pages  67  to  70;. 
However  the  constants  so  obtained  are  misleadingly  pessimistic.  In  practice 
it  seems  best  to  'tune'  the  method  on  some  particular  problem.  (If  you  graph 
the  output  the  'noise  level'  can  be  seen  very  clearly.)  Once  this  is  done  the 
formulae  of  this  paper  show  how  the  number  of  sample  points,  etc.,  need  to 
be  changed  for  other  problems. 

Remark  4.7.  Simple  numerical  experimentation  at  Prometheus  Inc.  suggests 
that  the  methods  of  this  paper  are  impractical  without  a  modern  desktop 
computer,  but  that,  once  a  speed  of  lO'’  operations  per  second  is  available, 
realistic  problems  can  be  tackled.  They  are  easy  to  program  and  1  suggest 
that  interested  readers  try  them  for  themselves. 
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^  We  give  sufficient  conditions  for  one  generalized  Rademacher-Riesz  prod¬ 
uct  to  be  equivalent  to  another.  Also,  we  discuss  criteria  for  mutual  singu¬ 
larity  of  these  measures. 

I.  Introduction 

In  this  talk,  we  deal  with  Riesz  product  type  measures  based  on  a  certain 
sequence  of  independent  random  variables  with  zero  mean.  A  particular 
case  of  these  measures  are  the  Rademacher-Riesz  products  on  [0, )!,  that  is, 
measures  of  the  form: 

n 

dp  =  lim  rfll  -(-akrk,|dx, 

k  1 

where  lokl  $  1  and  the  limit  is  taken  in  the  weak"  sense.  As  it  is  known  the 
Rademacher  function  ti^  is  defined  by  ri;(x)  =  1  -  Zck,  where  cc  is  the  k-th 
digit  of  the  dyadic  expansion  of  x. 

Generalized  Rademacher-Riesz  products  have  been  introduced  in  [3] 
and  the  entropy  of  such  a  measure  has  been  determined.  This  problem  was 
reduced  to  the  calculation  of  the  Hausdorff  dimension  of  the  support  of  this 
measure;  the  support  of  a  generalized  Rademacher-Riesz  product  is  under 
some  circumstances  of  fractal  type.  Exact  knowledge  of  these  circumstances 
depends  on  some  criteria  for  singularity  or  absolute  continuity  of  these 
measures.  Thus,  it  is  natural  to  ask  whether  the  supports  of  two  generalized 
Rademacher-Riesz  products  coincide  or  are  disjoint. 

In  this  work,  our  main  purpose  is  to  give  some  simple  criteria  for 
equivalence  or  mutual  singularity  of  two  generalized  Rademacher-Riesz 
products,  so  that,  the  results  obtained  will  be  immediately  adaptable  for  the 
case  of  ordinary  Rademacher-Riesz  products. 

Some  central  results  concerning  mutual  singularity  or  equivalence  for 
the  classical  trigonometric  Riesz  products  have  been  derived  by  several  au¬ 
thors.  In  [6]  Peyri^re  gives  a  very  simple  criterion  for  two  Riesz  product 
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measures  to  be  mutually  singular.  A  little  later,  in  [1],  Brown  and  Moran 
obtained,  among  other  things,  sufficient  conditions  for  equivalence  of  Riesz 
products  and  their  result  was  improved  by  Ritter  in  [7].  In  addition,  Kilmer 
and  Saeki  (cf.  [4])  have  contributed  to  the  study  of  the  same  subject.  More 
recently,  Parreau  (cf.  [5])  gave  a  criterion  for  the  equivalence  of  generalized 
trigonometric  Riesz  products. 

Here,  we  treat  measures  of  a  different  kind  from  those  mentioned  above 
by  means  of  novel  methods.  The  main  results  are  derived  via  probabilistic 
tools,  namely,  martingale  theory.  We  note  also  that,  though  our  measures  are 
closely  related  to  infinite  product  measures,  in  the  sense  of  [2],  the  results  are 
obtained  without  any  use  of  Kakutani's  well  known  criterion  for  comparing 
product  measures. 

In  the  next  section  we  give  some  preliminary  notions  and  we  recall  the 
definition  of  the  generalized  Rademacher-Riesz  product  measure.  Also,  we 
state  the  propositions  which  are  the  key  in  our  method. 


2.  Preliminaries 


Let  r  be  a  positive  integer  (r  ^  2)  and  let  Cn  be  the  n-th  digit  of  the  expansion 
of  X  e  fO,  1 )  in  the  base  r.  We  shall  call  a  cylinder  of  order  n  an  r-adic  interval 
of  the  form: 


E 


n.)  — 


fori  =  1,2 . r". 


We  define  the  sequences  of  functions  (R{;)J°,|  for  i  =  0,  1,  . . .,  r  —  1  on  [0, 1] 
as  follows: 


R{,(x)  =  1  - 

where  is  the  usual  Kronecker's  symbol  and  RJ,,  is  zero  on  each  r-adic 
rational.  We  call  the  sequence  the  system  of  r-adic  Rademacher 

functions  associated  with  the  digit  i. 

Let  (alf'j,  (On '), ...,  be  sequences  of  real  numbers,  satisfying 

Qn  ’  >0  and  Qn  *  =  '  ^  for  fH  ru  We  define  the  sequence  of  random 
variables  (Xk)k°,|  on  (0, 1]  as  follows : 


r  —  1 

Xk  =  ^a|,‘’Ri  fork  =  1,2,... 
i^O 

It  is  plain  that  (Xk)k°,,  are  independent  random  variables  with  zero  mean 
on  the  usual  probability  space  of  fO,  1]. 
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Let  En(x)  be  the  cylinder  of  order  n  which  contains  x  for  n.  =  1,2,... 
and  let  Eo(x)  =  [0, 1].  We  define 

1  " 

^dE„(x))  =  -  -Xdx)l. 

k  1 

If  En.i  =  UHo  t  i,s  is  the  partition  of  En.j  into  cylinders  of  order  n  +  1,  it 
is  easy  to  see  that  the  set  function  u  satisfies  the  conditions: 

1) 

r-  1 

ii(En,i )  =  ^  u(E„ ,  I  s)  for  every  nand  j  =  1.2.  —  r". 

s  C 

2) 

r*' 

y~  (^(Eri.i )  =  '  for ^11  ri. 

i  1 

Therefore,  |i  may  be  extended  to  a  Borel  probability  measure  on  10, 1]  in 
a  unique  manner.  It  is  clear  that  u  is  the  weak*  limit  of  the  sequence  of 
measures  Pn  defined  by  dp„  =  01.'  1 1  f  -  Xi. )  dA,  where  A  is  the  Lebesgue 
measure  on  the  a-algebra  B  of  the  Borel  subsets  of  (0, 11.  We  call  such  a  mea¬ 
sure  p  a  generalized  Rademacher-Riesz  product  (R.R.  product)  associated 
with  the  sequences  (On  ‘ ),  and  in  what  follows  we  shall  employ  the  notation 

X 

dp=  J7(l -Xn)dA.  (2.1) 

n  1 

Remark  2.1.  In  the  case  where  r  =  2  we  have  =  r„  and  R^  -  -r„,  where 
(r^)*  ,  are  the  usual  Rademacher  functions  on  [0, 1).  Ifwetakeal,^  = 

Qn"  =  where  lunl  ^  I,  the  measure  (2.1)  has  the  form 

OO 

dp=  III'  +  OnrnldA, 

n  1 

which  is  an  ordinary  Rademacher-Riesz  product  on  (0. 1]. 

One  can  prove  easily  the  following  elementary  lemma: 

Lemma  2.2.  Let  p  be  a  generalized  R.R.  product  as  in  (2.1 ).  Then,  we  have 
the  following  conclusions: 

1)  The  random  variables  (R!,)*  ,  are  independent  on  the  probability 
space  (fO,  II.B.p). 
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r-l 

u  =  c„  + 

i  0 

where  (c„ ),  (cl,'')  are  sequences  of  real  numbers  such  that  at  least  one 
cl,”  ^  0  and  one  c!,' *  ^  1  for  every  n.  Then  the  random  variables 
are  independent  on  the  probability  space  ([0, 1],  B,  m)- 
3)  We  further  have 

-  I  -ra'-'. 


We  also  need  the  following  proposition  which  is  a  simple  corollary  of 
the  martingale  convergence  theo’^em; 

Proposition  2.3.  Let  (f'n.F,,),,  be  a  martingale  on  a  probability  space 
(d.  F,  PI.  We  suppose  that 

sup  dP  <  +00  for  some  a  >  I. 

n  Jt! 

Then,  there  exists  a  function  f  such  that; 

1) 

|fldP<..  too, 

,  1) 

2) 

f M  ( X )  - )  t  ( X ) ,  P-almost  everywhere,  and 

3) 

|fn  -  f|  dP  0  as  n  -a  oo. 

■  u 


For  a  proof  see  |8,  p.  477-478). 
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3.  Criterion  for  equivalence  of  two  generalized  R.R.  products 


We  shall  prove  the  following  proposition: 

Proposition  3.1.  Let  du  =  i('  -^XnjdAand  dv  =  Hn  t(*  r ''•ildA, 

where  X„  =--  Y,,  =  Xl[~o  h!,‘'Rn.  If 


r—  1  /  Mi  lI'  I  \2 

V~  (Oi;  bp  ) 

2_  2_  (il.  (il 

n  1  i  P  Oil  n„ 


<  +00, 


then  [i,  V  are  equivalent  measures  (i.e,  m  <  <>nd  v  <  |,i). 

Proof.  It  is  enough  to  show  that  n  <  v.  We  consider  the  sequence  of 
functions: 


'''>  =  n(r^^)  for  n=  1,2,... 


Let  B„  be  the  o-algebra  generated  by  [En.i  :  1  $  i  $  r"  ].  It  is  easy  to  see  that 
the  stochastic  sequence  (F„,B„)„  m  is  a  martingale  on  the  probability  space 
(fO,  l|,B,v).  We  observe  that 


1  -X, 

1  -  Yk 


1  )  I  ( <1 


We  can  further  have 


,r!/  li.  ,,i- 

*  V  /  ~  f’k 


'•;zL 


r  - 1  /  M 


(I  -  Rii 


(I  -  RJ). 


Using  (3.4)  and  Lemma  2.2,  we  obtain 

rl  "■  /  r- 1  ,  (i)  ,  (it  ,2  \ 

k  I  V  i  0  f’k  / 

It  follows  from  our  assumption  (3.1 )  that 


u('>,2 

r-  (a,,  —  bn  ) 


-  <  t  oo; 


and  finally  from  (3.5)  and  the  above  we  have 


sup  Ff,  dv  <  +00. 

n  Jo 
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Now,  according  to  Proposition  2.3,  relation  (3.6)  implies  that  Fn  con¬ 
verges  in  L'('v).  Furthermore,  converges  v-almost  everywhere.  Let 
F  =  limn-too  Ftv  iy  -  a.e).  Since,  by  (3.3) 

r’ 

F„dv  =  l, 

.  c 

it  follows  that 
r’ 

Fdv=l.  (3.7) 

.  0 

If  H  =  Hi  -f-  Hi  is  the  Lebesgue  decomposition  of  |i  into  absolutely  continuous 
and  singular  parts,  respectively,  with  respect  to  v,  it  is  well-known  that  (see 
[8,  P.493J) 


di(A)  = 


for  every  A  €  B. 


(3.8) 


Combining  (3.7)  with  (3.8)  it  follows  that  H2  =  0,  therefore  |i  <  v,  and 
the  proof  is  complete.  I 


Remark  3.2.  It  is  an  immediate  consequence  of  the  proposition  just  proved 
that  if  for  two  generalized  R.R.  products  u.v  associated  with  Qn'  and 
respectively, 

^  ^(Qn’  <  +00 

n  I  i  0 

and 

liminfBn>0,  where  B^  =  mino$i$r-i  bn '  (3.9) 

n— *00 

hold  true,  then  (i,-v  are  equivalent.  It  is  remarkable  that  one  can  prove 
this  assertion  without  the  application  of  the  Lebesgue  decomposition  used 
above.  It  suffices  to  observe  that  (3.9)  implies  lim  sup^_^^  B*  <  1,  where 
B*  =maxosi^r-i  bn  ’  which  in  turn  implies  that  vis  a  continuous  measure. 
Hence,  the  sequence  of  measures  tUn  defined  by,  dcun  =  Fn  dv,  converges 
to  n  in  the  weak*  topology.  This  and  the  convergence  of  Fn  in  L'  (v)  yield 

|i  <  V. 

In  the  case  where  n,v  are  two  ordinary  R.R.  products  associated  with 
(Un ),  (bn )  respectively.  Proposition  3.1  is  stated  as  follows: 

Corollary  3.3.  Let  dn  =  i(*  +  UnrnldAand  dv  =  fj^  ,(1  +  bnTnldA. 

If 

f-,  (l-ai)(l-b^n) 


<  -Foo, 
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then  are  equivalent  measures. 

Proof.  It  suffices  to  observe  that  in  this  case  (3.5)  has  the  form: 
-  (au-bk)^' 


Remark  3.4.  It  is  now  clear  that  for  two  R.R.  products  associated  with 
(on),  (bn)  respectively,  if  £“^,(an-b„)^  <  +oo  and  lim  sup^_^  |bnl  <  1, 
then  M-.v  are  equivalent  measures.  By  virtue  of  Corollary  3.3,  u.v  may  be 
equivalent  in  some  cases  where  (On  -  b„)^  <  +oo  whereas  lonl,  Ib^l 
approach  1.  Consider,  for  example,  On  =  1  -  Cn  -  c^,  bn  =  1  -  Cn,  where 
Cn  >  0  and  \  However,  as  we  shall  see  in  the  next  section,  in 

the  case  where  (On  -  bn)'^  <  +oo  and  lonl.lbnl  approach  1,  may 
be  mutually  singular  under  some  further  assumptions  for  (on )  and  (bn ). 

4.  Mutual  singularity  of  generalized  R.R.  products 

It  can  be  easily  shown  that  if  |i,v  are  generalized  R.R.  products  associated 
with  (on’),  and  (bn')  respectively  and  2Ii7o(an '  -  bt,' ')^  =  +oo  then 
are  mutually  singular.  This  may  happen  in  some  cases  where  the  above 
series  converges,  provided  of  course  that  the  condition  (3.9)  is  dropped.  The 
following  proposition  gives  us  further  information  about  mutual  singularity 
of  generalized  R.R.  products: 

Proposition  4.1.  We  suppose  dn  =  i(*  -  Xn)  dA  and  dv  =  ,  (1  - 

Y„)dA,  where  Xn  =  Il[“o  Qn’Ri,.Yn  =  bn’Rn-  Let  us  put 

2 


=  +0O. 

n  -  1 


If 


q.2  _  ^  ~  bn  ) 

i  0 


Pn 


Mn  =  max 


05i$r-1  h*'* 


supMn  <  +00  and 


then  p,  V  are  mutually  singular. 
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Proof.  There  exists  a  sequence  of  real  numbers  (Un)  such  that 

OO  OO 

Un  >  0,  ^  of^Un  =  +00,  <  +°0. 

n  1  n  1 

We  define  the  sequence  of  random  variables 
,  Yn-Xn 

= -j-— ^Un  forn=1,2,... 

'  '  n 

Using  Lemma  2.2  stated  above  and  formula  (3.3)  it  follows  that  (£,„  )„  are 
independent  random  variables  with  respect  to  v,  such  that 

j  4n  dA-  =  0. 

From  formula  (3.4)  we  obtain 

Li  dv  =  v.i(ji. 

Jo 

Now,  we  define 

Cn^Lti-Unffn  forn  =  1.2, ... 

We  observe  that  (Ci,  )„'  i  are  independent  random  variables  with  respect  to 
u,  and  using  again  (3.3),  (3.4)  and  Lemma  2.2,  we  obtain 

f'  f' 

Lndu  =  Unan.  hence  Cn  du  =  0. 

.  0  Jo 

It  is  not  hard  to  see  that 

(-1 

ci  du  < 

Jo 

Thus,  by  the  Khinchin-Kolmogorov  Theorem  (see  [8,  p.  359))  we  conclude 
^  £.n(x)  converges  v-a.e. 

n  1 

and 

^  Cn(x)  converges  u-a.e. 

n  I 

If  there  is  some  xo  such  that  both  ,  Ln(xo),  and  ,  Cn(xo)  converge, 
it  follows  that  <  +oo;  contradiction.  This  proves  that  n  and  "v 

are  mutually  singular,  as  claimed.  | 
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For  the  case  of  two  ordinary  Rademacher-Riesz  products  u  and  o/  we 
have  the  following: 

Corollary  4.2.  Let  dn  =  0^  i('  +  Ontn)  d-\,and  dv  =  HST  i  +  b^Tn)  dA; 
if 


L 

n  1 


( On  bn ) 

1  -b^ 


=  -(-oo  and  sup 


l-Qn 
1  -  b^ 


<  +00, 


then  4  and  y  are  mutually  singular. 

This  follows  from  the  proof  of  Proposition  4.1  under  some  minor  mod¬ 
ifications. 

We  complete  this  paper  with  the  promised  example  of  mutually  singu¬ 
lar  Rademacher-Riesz  products  (.i.v  associated  with  (q„  ),  (b„  ),  in  the  case 
where  ,  (on  —  bn  <  -t-oo  and  lonUbnl  approach  1.  According  to  the 
Corollary  4.2,  it  suffices  to  take  q,,  =  1  -c,,,  b„  =  1  -2c„,  where  0  <  c„  <  j, 
21*  ,  Cn  =  +00,  21*  1  cl  <  +00.  Notice  also  that  these  measures  n  and  v 
are  continuous. 
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^  Second  order  Hamilton-Jacobi  equations  in  infinite  dimensions  are  semi- 
linear  parabolic  equations  in  which  the  unknown  function  \i(  t,  x)  is  defined 
for  real  t  and  x  belonging  to  a  Hilbert  space  X.  Our  presentation  will  focus 
on  the  relationship  between  such  equations  and  sttKhastic  optimal  control 
of  distributed  parameter  systems. 

Perturbation  methods  can  be  used  to  study  Hamilton-Jacobi  equa¬ 
tions.  Such  methods  are  based  on  a  detailed  analysis  of  the  linearized 
problem,  which  is  related  to  solutions  of  some  stochastic  partial  differen¬ 
tial  equations.  We  will  describe  two  different  approaches  to  the  linearized 
equation:  one  is  based  on  the  probabilistic  representation  formula  for  the 
solution,  the  other  uses  just  functional  analysis. 


1.  Finite  dimensionai  optimai  controi 

Let  us  consider  a  free  terminal  point  problem  of  Mayer  type.  More  precisely, 
one  is  given  a  complete  separable  metric  space  U  (the  control  space)  and  a 
system  x(  )  governed  by  the  state  equation 

f  x'(t)  =  f(t,x(t),u(t)),  to  $  t  $  T 

Ix(to)  =xo  e  R"  '  ■  ^ 

Here  u  is  a  measurable  map  u  :  (O.T)  ->  U  (usually  called  a  control)  and 
f  :  fO,  T]  X  R"  X  U  — >  R"  is  a  continuous  vector  field  satisfying 

f||f(t,x,u)||  $  C(1 +  ||xi|) 

\|lf(t.x.u)-f(t,y,u)||  $C||x-y|| 

617 
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for  all  x,y  €  [0, T],  and  u  €  U.  Notice  that,  under  the  above 

assumptions,  for  any  initial  data  (to.xo)  €  fO,T]  x  R'^  and  control  u  the  state 
equation  (1.1)  has  a  unique  solution  x(  ; to, xo.u)  €  C([0,T];R'^). 

Given  a  Lipschitz  continuous  function  g  :  R"  ->  R,  our  optimal  control 
problem  consists  of 

minimizing  g(x(T;to,Xo,u))  over  all  controls  u.  (1 .3) 

A  control  u  at  which  the  minimum  is  attained  is  said  to  be  an  optimal 
control  and  the  corresponding  trajectory  x(  )  is  called  an  optimal  trajectory. 

The  ultimate  goal  of  optimal  control  theory  is  to  give  conditions  for  op¬ 
timality.  The  following  is  just  an  example  of  a  result  that  states  the  existence 
of  optimal  controls  (see  for  instance  [11]). 

Proposition  1.1.  Assume  (1.2)  and  suppose  further  that  f(t,x,  U)  is  compact 
and  convex  for  all  (t,x)  €  (O.T)  x  R".  Then,  problem  (1.1)-(1.3)  has  at  least 
one  optimal  solution. 

Let  now  f  be  differentiable  with  respect  to  x  and  g  be  differentiable.  Fix 
a  control  u  and  set  x(  )  =  x(  ;tc,xo,u). 

Definition  1.2.  The  adjoint  state  of  the  pair  u,  x  is  the  solution  p  of  the  adjoint 
system 

r-pU(t)  =  [Dxf(t.x(t),u(t))I*p(t) 
l-p(T)  =  Dg(x(T)) 

A  useful  tool  in  the  construction  of  optimal  trajectories  is  the  value 
function  defined  as 

V(to,xo)  =  inf  {g(x(T;to,xo,u))  lu  :  [0,  T]  — >  U  measurable  .} 

Under  assumptions  (1.2),  it  is  easy  to  see  that  V  is  Lipschitz  continuous  in 
[0,  T]  X  R".  However,  V  is  not  differentiable  in  general  (see,  e  g.,  [11]). 

For  any  (t,x,p)  e  fO,T]  x  R"  x  R")  let  us  set 

H(t,x,p)  =  supp  ■  f(t,x,u).  (14) 

ueu 


Proposition  1.3.  Assume  (1.2)  and  suppose  further  that  f  be  differentiable 
with  respect  Jo  x  and  g  be  differentiable.  Let  {u,x}  be  a  control-trajectory 
pair,  and  let  p  be  the  corresponding  adjoint  state.  Then,  u  is  optimal  if  and 
only  if 


p(t)  •  f(t.x(t),u(t))  =  H(t,x(t),p(t)) 


(1.5) 
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and^ 
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(H(t.x(t),p(t)),-p(t))  e  D+V(t,x(t))  (1.6) 

for  a.e.t  €  [to,T]. 

Equation  (1.5)  above  is  the  well-known  Pontryagin  Maximum  Principle  (see, 
e  g.,  [11]).  The  inclusion  ( 1 .6 )  is  the  result  of  the  work  of  several  authors  (see, 
e  g.,  [4]  and  the  references  cited  therein). 

The  above  proposition  explains  the  importance  of  the  value  function,  as 
noted  first  by  Bellmann  who  established  the  so-called  dynamic  programming 
approach  to  optimal  control.  This  approach  tries  to  characterize  the  value 
function  without  computing  the  infimum  in  its  definition  and  then  derives 
information  on  optimal  controls  and  optimal  trajectories  using,  e.g,,  (1.6). 
The  independent  characterization  of  the  value  function  is  usually  provided 
by  the  Hamilton-Jacobi-Bellmann  equation 


— vt -t- H(t,x,  — D^v)  =  0,  (1.7) 

which  V  satisfies  for  a. e.  (t,x)  €  [O.TlxR".  Equation  (1.7),  however,  has  no 
classical  solutions,  in  general,  and  indeed  we  have  already  noted  that  V  can 
fail  to  be  differentiable.  On  the  other  hand,  the  Cauchy  problem  consisting 
of  (1.7)  and  the  terminal  condition 

v(T,x)  =  g(x)  (1.8) 

has  nonunique  solutions  in  the  class  of  Lipschitz  continuous  functions.  The 
problem  of  finding  a  suitable  class  of  weak  solutions  to  ( 1 .7)-(  1 .8),  in  which 
one  has  both  existence  and  uniqueness  of  solutions,  has  been  solved  by 
Crandall  and  Lions  in  [7]  by  considering  the  so-called  viscosity  solutions. 

In  order  to  give  the  definition  of  these  solutions,  let  us  introduce  first 
the  super-  and  subdifferential  of  a  given  function.  Let  Q  c  be  an  open 
domain,  xo  €  O  and  (p  :  D  ->  R. 

Definition  1,4.  The  (possibly  empty)  sets 

D>(«1  -  (p  €  R"  I  lim  sup  o| 

I  >s-.xo  |x-Xo|  j 

.  (p  €  R"  I  lim  ipf  oW-oIX'l-P  lx-xol  ^  o'! 

(,  *-*><0  |X  —  Xol  J 

are  called,  respectively,  the  superdifferential  and  the  subdifferential  of  <p 
at  xo- 

The  following  is  one  of  the  possible  definitions  of  viscosity  solution,  see  [6], 

^  This  set  denotes  the  superdifferential  defined  below,  see  Definition  1.4. 
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Definition  1.5.  A  function  v  G  C(10,  T[x  R'^)  is  said  to  be  a  viscosity  solution 
of  ( 1 .7)  if,  for  all  (t,x)  elO.TlxR", 

-Pt  +  H(t,x.-Px)  ^  0  V(pi,px)  e  D'v(t,x) 

-Pt  +  H(t,x,-px)  ^  0  V(pi,p«)  e  D~v(t,x). 

The  following  result  is  proved  in  [6]. 

Proposition  1.6.  Assume  (1.2)  and  define  H  as  in  (1.4).  Then,  the  terminal 
value  problem  (1.7)-(t.8)  hasa  unique  viscosity  solution  v  g  C([0,T1  x  R’’). 
Moreover,  v  coincides  with  the  value  function  V  of  problem  ( 1 .1 )-( 1 .3). 

2.  Control  of  distributed  parameter  systems 

In  order  to  motivate  the  analysis  of  infinite  dimensional  systems,  let  us 
discuss  the  following  example. 

Example  2.1.  Let  D  c  R'^  be  an  open  bounded  domain,  £,  €  D,  and  fix 
xo  €  L^(n).  Let  4) :  R  — »  R  be  a  given  Lipschitz  function. 

For  any  u  €  L’^lfto,  T]  v  Q)  we  denote  by  x(t,  £.;  to,  xo,  u)  the  solution 
of  the  parabolic  equation 

f  Xt(t,  £.)  =  +  u(t,L)  in  (to.  T]  ^  Ci 

^  x(t,£,)  =0on  fto.Tl  -  dD  (2.1 1 

[  x(to,L)  =  xo(L) 

Given  M  >  0,  the  problem  of  minimizing  4)(x(T,L;to,Xo,u))dL  over 
all  controls  u  €  L“-(fto,Tl  x  Q)  satisfying  |u(t,£.)l  $  M,  is  an  infinite¬ 
dimensional  equivalent  of  the  Mayer  problem  (1.1)-(1.3).  A  more  realistic 
situation  than  the  one  described  in  (2.1),  occurs  when  the  control  is  con¬ 
centrated  at  the  boundary  of  the  domain,  i.e.,  enters  the  state  equation  as  a 
boundary  data  of  Dirichlet  (resp.  Neumann)  type  as  follows: 

f  Xt(t,L)  Atx(t,£,)  in  (to.Tl  X  n 

<  x(t,  £,)=■- u(t,£,)(resp  .|^(t,L)  =  u(t,£,))onIto.Tl  \  do  (2.2) 

[  x(to,£,)  =  xo(L) 

Boundary  control  problems  can  be  also  given  an  abstract  formulation  as  we 
will  see  below. 

Let  the  state  space  be  a  separable  Hilbert  space  X,  with  sc.-dar  product 
(  ,  )  and  norm  ||  ||,  and  let  the  control  space  be  a  complete  separable  metric 
space  U. 

Let  A  :  D(A)  C  X  — >  X  be  the  generator  of  a  strongly  continuous 
semigroup  of  bounded  linear  operators  on  X,  that  will  be  denoted  by  e‘ t  ^ 

0,  with  ||c‘''x||  $  i|x||,  Vx  €  X.  By  the  Hille-Yosida  Theorem  this  assumption 
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is  equivalent  to  requiring  that  A  be  a  densely  defined  closed  linear  operator, 
whose  resolvent  set  contains  the  positive  real  axis,  satisfying  IKA-Al^'xH  ^ 
|||x||,VA>0,Vxg  X. 

Let  f  ;  fO,  Tj  X  X  X  U  — >  X  be  a  continuous  map  satisfying  the  following 
assumptions,  that  are  the  analogue  of  ( 1 .2): 

f||f(t,x,u)|KC(l  +  ||x||) 

\||f(t,x,u)-f(t,y,u)||^C|ix-yi| 

for  all  x,y  t  X,t  g  [0,1], and  u  €  U. 

Under  the  above  assumptions,  for  any  xo  t  X  and  measurable  map 
u  :  [0.  T]  — >  X,  the  state  equation 

I  x'(t)  =  Ax(t)  +  f(t,x(t),u(t)),  to  ^  t  $  T 
\  x(to)  =  xo 

has  a  unique  mild  solution  x(  ;  to.xo.u)  G  C(!0,  T];  X),  i.e.. 


x(t)  =  e'*  ‘‘■'''^xo  + 


e''-^'^f(x(s).u(s))ds 

to 


for  all  t  €  fO,  T]. 

Given  a  Lipschitz  function  (p  :  X  ->  R,  we  are  interested  in  the  prob¬ 
lem  of 


minimizing  (p(x(T;to,  xo,  u))  over  all  controls 


(2.5) 


Remark  2.2.  Returning  to  the  example  above,  it  is  easy  to  set  problem  (2.1 ) 
in  the  abstract  framework  (2.4)  taking 

■  X  =  L^(n) 

u  =  (U6  L°^(n)  I  |u(L)!  $  M,£.e  Oa.e.) 

<  fit,  X,  u)  =  u 

D(A)  =  H^(n)  n  Ax(L)=Ax(L) 

[  <P(x)  =  In  ij)(x(£,))d£,. 

The  assumptions  of  the  Hille-Yosida  Theorem  are  easily  checked  since  A  is 
self-adjoint  and  dissipative  (see,  e.g.,  [17]). 

The  boundary  control  problem  (2.2)  in  the  Dirichlet  case  can  also  be 
given  an  abstract  setup  by  choosing  X,  A,  and  <p  as  above  and  taking  U  = 
(u  €  L'*’(0O)  I  |u(£,)|  ^  M,Le  dQa.e.).  The  slate  equation  is  of  the  form 
(2.4)  with  a  discontinuous  f  as  follows: 

f  x'(t)  =  Ax(t)  -  ADu(t) 

1  x(to)  =  Xo 
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where  D  :  U  -»  X  is  the  so-called  Dirichlet  map  defined  as 


Du  =  w 


(  Aw  =  0  in  n 
t  w  =  u  on  dfl. 


By  similar  methods  one  can  also  treat  problem  (2.2)  in  the  Neumann  case. 
The  value  function  of  problem  (2.4)-(2.5)  is  defined  as 

V(to,xo)  =  inf{<p(x(T;to,xo,u)) |u ;  [O.T]  — >  U  measurable  ). 

for  all  (to,Xo)  €  [O.T]  x  X  and  the  corresponding  Hamilton-Jacobi  equation 
is  the  following 


— Vt  H(t,  X,  —  Dxv)  —  (Ax,  DxV)  =  0, 
v(T,x)  =  (p(x) 


where 


H(t,x,p)  =  sup(p,f(t,x,u)).  (2.7) 

u€U 

Comparing  equation  (2.6)  with  (1.7),  it  is  easy  to  realize  that  it  presents 
additional  difficulties:  indeed  the  term  (Ax,  Dxv)  is  discontinuous  in  X,  as 
A  is  an  unbounded  operator.  Nevertheless,  after  the  pioneering  work  [1] 
concerning  the  linear-convex  case,  several  results  have  been  obtained  to 
extend  the  viscosity  solution  approach  to  infinite  dimensions,  see  [7, 13, 18). 
The  Hamilton-Jacobi  equation  related  to  the  boundary  control  problem  (2.2) 
in  the  Neumann  case  is  studied  in  [5]. 


3.  Stochastic  optimal  control 

Let  now  Q  be  a  positive  nuclear  operator  in  X,  so  that  there  exists  a  complete 
orthonormal  system  in  X,  {e'^),  and  a  sequence  of  nonnegative  real  numbers, 
{Ak},  such  that 

f  (i)  Qe'‘ =  \ke^  k€N 
I  (ii)  ir  I  <  oo 

Let  {D,J,P}  be  a  complete  probability  space  and  W(t)  a  Q-Wiener  process 
in  X,  i.e., 

OO 

W(t)=  ^  V^|3k(t)e\  (3.2) 

k  1 

where  3k  are  mutually  independent  standard  Brownian  motions. 
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For  any  t  ^  0  let  Jt  be  the  cr-algebra  generated  by{|3k(s)|ic$:  1,0^ 
s  $  t}  and  let  M^(to,T;X)  denote  the  space  of  the  X-valued  processes  x(  ) 
such  that  x(t)  is  Jt-naeasurable  for  all  to  ^  t  ^  T  and 


l|x(t)|l^dt  <  oo 


to 


where  E  denotes  the  expectation. 

Under  assumptions  (2.3),  for  any  u  €  M^(to,  T:X)  the  state  equation 

fdx(t)  =  [Ax(t)  +  f(t,x(t),u(t))]dt  +  dW(t),  to^t$T 
|x(to)  =  Xo 

has  a  unique  mild  solution  x{  ;to,xo,u),  i.e., 

x(t)  =  e**“*'’'''xo  +  [  e'^~®’'^f(x(s),u(s))ds  +  WA(t), 

Jto 

WA(t)=f  e'‘-*'''dW(s). 

Jlo 

(3.4) 

for  all  t  €  fO,  T].  Moreover,  x(  ;to,xo,u)  is  continuous  with  probability  one. 


Remark  3.1.  When  [e*^}  are  eigenvectors  of  A,  i.e., 
€  D{A).  Ac‘'  =  -cxke^  >  0, 


the  stochastic  convolution  can  be  expressed  in  the  form 


WA(t)  =  ^ 


k  I 


(t  -s)a 


^d(3k(s). 


(3.5) 


Notice  that,  in  this  case,  operators  Q  and  c’''  commute. 

Given  a  Lipschitz  function  (p  :  X  — >  R,  let  us  consider  the  problem  of 

minimizing  El(p(x(T;to.xo,u))l  over  all  u  €  M^(to,T;X)  (3.6) 


in  which,  for  simplicity,  we  have  assumed  ‘he  state  space  to  coincide  with 
the  control  space.  The  value  function  of  problem  (3.3)-(3.6)  is  defined,  as 
usual,  by 


V(to,Xo)  =  inf(Ef(p(x(T;to.xo,u))l|u  €  M^(tc, T; X)). 


The  Hamilton-jacobi  equation  related  to  the  stochastic  optimal  control  prob¬ 
lem  (3.3)-(3.6)  is  the  following  semilinear  parabolic  equation  in  infinitely 
many  variables 

fv,  +  ^Tr(QDjv)4-(Ax,D,v)-H(t,x.-Dxv)  =  0. 

\  v(T,x)  =  <p(x) 


(3.7) 


{  Cannarsa,  Da  Prato 


(i24  } 


where  Tr  denotes  the  trace,  D^v  the  second  derivative  with  respect  to  x 
(which  is  an  operator  in  X)  and  H  is  still  given  by  (2.7). 

Remark  3.2.  It  is  very  interesting  for  applications  to  consider  the  case  of  a 
cylindrical  Wiener  process,  i.e.,  of  a  process  of  the  form  (3.2)  with  =  1 ,  vk 
or,  equivalently,  Q  =  I.  In  this  case,  W  is  usually  referred  to  as  a  white 
noise.  Although  W  is  not  a  Gaussian  process  in  X,  one  can  show  that  under 
suitable  assumptions  the  convolution  process  Wa  in  (3.4)  is  Gaussian  (see 
the  next  section). 

In  [3]  the  Hamilton-Jacobi  equation  (3.7)  is  studied  by  perturbation 
methods.  For  this  purpose,  it  is  essential  to  solve  a  linearized  form  of 
(3.7).  While  we  refer  the  reader  to  (10]  for  a  probability  approach  to  linear 
equations,  in  the  next  section  we  will  explain  a  different  method  just  based 
on  functional  analysis.  To  simplify  the  exposition,  we  will  describe  this 
technique  in  the  case  of  a  nuclear  Q.  Different  approaches  to  (3.7)  are 
developed  in  [8,  12,  15],  and  (14). 

4.  A  functional  analysis  approach 

In  this  section  we  consider  the  problem 

I  V,  UrlQD^o  •  ^Ax.D,v\, 

\  vlO,  x!  -■  4)|xt 

where  Q  is  a  linear  bounded  .self-adjoint  operatt)r  in  X  satisfying  (3.1 1  and 
A  is  the  mfinitesirr.al  generator  of  a  strongly  continuous  semi-group  in  X. 
We  begin  our  analysis  with  the  case  of  A  ■  0,  i.e.,  with  the  infinite  dimen¬ 
sional  heat 

f  V,  iTrlQD'v), 

lv(0,x)  4)(x!  ' 

We  define  Ci,fXi  to  be  the  Banach  space  of  all  functions  4)  :  X  *  R 
which  are  uniformly  continuous  on  X  and  such  that 

4)  0  -  sup  <4)1  x  I  <  oc. 

X-  X 

For  k  -  N,  we  denote  by  C[;lX|  the  set  of  all  functions  4)  :  X  >  R  which 
are  uniformly  continuous  and  bounded  on  X,  together  with  all  their  Frechet 
derivatix  es  up  to  the  order  k.  We  set 

IX)  -  n  c\,ix]. 

)  I 

It  is  well  known  that,  if  X  is  finite-dimensional,  then  all  spaces 
C|;(X),k  1,2 . oc,  are  dense  in  Cb(X).  If  X  is  infinite-dimensional,  on 
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the  contrary,  then  the  density  of  C^(X)  in  C^(X)  fails  to  be  true  for  k  ^  2, 
see  [16,  Section  7].  For  this  reason  we  introduce  the  space  C^(X)  to  denote 
the  closure  of  C^(X)  in  the  topology  of  Cb(X). 

Let  us  define  the  operator  Aq  by 

Aq(p=  iTr(QD^,(p),  V(peC^(X). 

If  we  regard  an  element  cp  t  C^IX)  as  a  function  of  infinitely  many  variables 


(p|x)  =  (p(xi  ,X2 . Xn,  =--  (x,c") 


then  the  operator  Aq  may  be  represented  as  follows 


A(p  Ai 


k  I 


0'^(p 
■  0X^  ' 


where  =  (D^cp.c^)  and  =  (D^<pc^,c‘'}. 

It  is  well  known  that  if  the  dimension  of  X  is  finite,  then  the  linear 
operator  Aq  is  not  closed  in  C^(X).  However,  we  have  the  following  result 
proved  in  [2). 


Theorem  4.1.  The  linear  operator  Aq  is  closable  and  its  closure  is  the  gener¬ 
ator  of  a  strongly  continuous  semigroup  of  contractions  Sq(  )  in  Cb(X). 

Clearly,  the  semigroup  Sq  provides  a  notion  of  solution  to  problem  (4.2), 
which  is  classical  for  (p  €  Cb(Xl.  Concerning  the  approximation  of  the 
semigroup  Sq  by  its  finite-dimensional  projections  we  have  the  following 
result.  Let  us  introduce  the  heat  semigroups  in  Cj)(X),  with  respect  to  the 
k'*'  component,  which  is  defined  as 


(T».(t)(p)(x)  = 


1 


jTnXit 


<p(xi . Xk-  I  ,i,  Xk  .  I , . . .)  dL, 


Vx  e  X.  It  is  easy  to  show  that 


Ultlip!;^,  ts  ;;(pl!j, . 


(4.3) 


Lemma  4.2.  Assume  (3.1),  then  the  product  flk  i  fc(tl4>  converges  in 
C)^(X)  as  n  a  oo,  for  all  ip  e  C^(X),  uniformly  on  the  bounded  sets  of  [0.  oo[. 
Moreover,  the  operator 

n 

SQ(t)(p:=  lim  rr  Tk(t)(p,  (p  6  C^(X) 

n  --4:x3  * 

k  1 


(4.4) 
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is  a  strongly  continuous  senaigroup  of  contractions  in  C^(X).  Furthermore, 
for  any  k  e  N,  the  semigroup  Sq  (t)  maps  C^(X)  into  itself,  and  the  restriction 
of  SQ(t]  to  Cb(X)  is  again  a  strongly  continuous  semigroup  of  contractions. 

Proof.  We  first  consider  cp  e  C^{X).  In  this  case  we  have 


n  4  1 


k-1 


Tu(t)<p  -  Tk(t)(p 

1 

n 

riTkit) 


k  t 

< 


k-  1 


i|T,n  i(t)<p  -  (pIIj 


‘  ^Tnt  i(s)(p 

n  9t 


ds 


^  r 

2  Jc  dxi , 


S)(p 


ds 


Therefore,  recalling  that  Q  is  nuclear,  we  conclude  that 

i  Tk(t)(pjn€N  is  a  Cauchy  sequence  for  all  (p  €  C^IX).  Since 


HThlt) 


^  1 


CICJ(X)) 


and  C^(X)  is  dense  in  C^(X),  the  first  part  of  the  conclusion  follows.  The 
general  case  is  easily  checked.  | 

We  now  turn  to  the  general  equation  (4.1).  Let  us  first  proceed  formally, 
by  assuming  that  v(t,  x)  is  a  solution  to  (4.1 ).  Setting 

v(t,x)  =  u(t,c'^x)  (4,5) 


then 


v,(t,x)  =  u.it.e'-^x)  +  (Ae‘''x,D,^u(t,c'''x));  x  e  D(A) 
D*v(t,x)  =  e*''’ Dxu(t,e'^x) 

D^v(t,x)  =  c‘''‘D^u(t,e‘''x)c'^. 

It  follows  that 


u,(t,e'^x)  =  ^  Tr  fe'''Qe''^'D^u(t,c'^x)j  , 

thus  u  is  a  solution  of  the  problem 

I  u,(t,x)  =  i  Tr[e'''Qe‘''’D^u(t.x)l  in  lO.T]  x  X 
1  u(0,x)  =  (p(x),  (p€C^(X). 
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In  the  following,  we  will  first  solve  (4.6)  by  using  an  abstract  result  [9],  then 
we  will  show  that  formula  (4.5)  gives  a  solution  of  our  problem. 

We  write  problem  (4.6)  in  abstract  form  in  the  Banach  space  X  =  Cb(X), 
as  follows 

f  u'(t)  =  A(t)u(t)  (4 

I  u(0)  =  (p 

where  A(t)  is  the  closure  of  the  operator 

A(t)(p  =  ^Tr(e‘^Qe‘^*D^(p);  V(p  €  C^(X),  t  6  [0,  T]  (4.8) 

By  Theorem  2.1,  A(t)  is  the  generator  of  a  strongly  continuous  semigroup  in 
X.  More  generally,  we  consider  the  problem 

fu'ft)  =  A(t)u(t) +f(t)  ,49V 

lu(0)  =<p 

wheref  €  L'^(0,T;X),p  5  1.  We  recall  that  u  €  L*’(0,T-,X)  is  a  strong  solution 
of  (4.9)  if  there  exists  a  sequence  in  W’-»'(0,T;X)such  that 

1)  Un  ^vinC([0.Tl;X) 

2)  u;(  )-A(  )u„  -^f(-)inL^’(0.T:X) 

3)  Un(0)  ->  (p  in  X 

Moreover,  u  is  a  strict  solution  of  (4.9)  if 

1)  u€  W''P(0,T;X), 

2)  u(t)  €  D(A(t))  fora.e.t  €  [O.Hand  A(  )u(  )  €  L'’(0.T;X), 

3)  problem  (4.9)  is  fulfilled. 

The  following  result  is  a  special  case  of  Theorem  3  in  [9]. 

Proposition  4.3.  Let  X,  V  be  Banach  spaces  with  y  c  X  continuously  and 
densely.  Let  {>l(t)Jtg(o,7 1  be  a  family  of  linear  operators  in  X  such  that 

1)  ./l(t)  is  a  generator  of  a  strongly  continuous  semigroup  in  X,  with 

2)  Y  c  D(.4-2(t))  forall  t  €  [O.TI 

3)  >l(  )y  6  C(fO,Tl;X),forally  e  y 

4)  The  parlA^it)  of  Alt)  in  y  is  the  generator  of  a  contraction  semigroup 
in  y  with 

Then,Vx€  X,  Vf  €  L'’(0,T;X),p  $  I,  problem  (4.9)  has  a  unique  strong 
solution  u.  Moreover,  u  6  C([0,T];X)and  there  exists  a  constant  K  >  0, 
depending  only  on  wx  and  cuy  such  that 

|v(t)|$K{|)(p|k  +  ||f||L-(0,T;X|} 


(4.10) 
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We  now  solve  problem  (4.6).  By  a  strong  solution  of  (4.6)  we  mean  a 
strong  solution  of  (4.7)  in  LP(0,T;X),Vp  ^  1.  For  the  following  result  see  [2]. 

Proposition  4.4.  Assume  (3.1)  and  (2.3)  and  let  cp  €  Cb(X).  Then  prob¬ 
lem  (4.6)  has  a  unique  strong  solution  u  which,  in  addition,  belongs  to 
C(fO,Tl;CS(X)).  Moreover,  if  (p  €  C^(X)  then  u  e  C’([0,Tl;CS(X))  n 
C([0,  T|;  Cb(X)),  and  equation  (4.6)  is  fulfilled.  Furthermore,  there  exists 
a  constant  C,  depending  only  on  tux  and  tui)  such  that 

||u'(t.-)||o  +  ||v(u,-)ll2  $  C||(p||2  (4.11) 

for  all  t  €  (0,  T). 
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I 

a  Stochastic  differential  equations  governing  processes  taking  values  either 
in  Hilbert  spaces  or  in  duals  of  nuclear  spaces  arise  in  studying  stochastic 
models  of  the  behavior  of  voltage  potentials  of  spatially  extended  neu¬ 
rons.  A  brief  survey  of  the  theory  developed  so  far  will  be  given  in  as 
self-contained  a  manner  as  possible.  The  asymptotic  behavior  of  large  sys¬ 
tems  of  interacting  neurons  will  also  be  considered. 

I.  Introduction 

In  this  article,  my  aim  is  to  present  a  brief  survey  of  recent  advances  in  infinite 
dimensional  stochastic  differential  equations  (SDE's).  I  shall  naturally  be 
concerned  with  the  work  that  a  number  of  my  colleagues  and  students  of 
mine  and  I  have  been  doing  over  the  last  several  years.  The  work  originated 
in  attempts  to  find  suitable  SDE  models  to  describe  the  space- time  fluctuation 
of  voltage  potentials  of  spatially  extended  neurons.  The  idea  of  finding 
stochastic  partial  differential  equations  (SPDE's)  sprang  from  the  efforts  to 
introduce  randomness  into  the  Hodgkin-Huxley  theory  in  which  a  system 
of  nonlinear  partial  differential  equations  (PDE's)  plays  a  fundamental  role 
in  this  part  of  neurophysiology.  Our  work  led  us  to  investigate  both  linear 
and  certain  types  of  nonlinear  SDE's  governing  stochastic  processes  taking 
values  in  a  particular  type  of  infinite  dimensional  space,  namely,  the  dual  of 
a  nuclear  space. 

A  brief  account  of  the  principal  results  will  be  presented  in  which  the 
emphasis  will  be  rather  on  the  manner  in  which  the  problems  are  formulated 
and  the  SDE's  derived  than  on  the  proofs.  The  reader  interested  in  the  latter 

t  This  research  was  supported  by  the  Air  Force  Office  of  Scientific  Research  Contract  No. 
F-90-0030. 
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will  find  them  in  the  cited  papers  which  have  either  been  already  published 
or  are  being  prepared  for  publication. 

As  a  result  of  this  preoccupation  with  nuclear  space-valued  SDE's  sev¬ 
eral  important  developments  will  not  be  discussed,  notably,  measure-valued 
SDE's  and  especially,  the  Fleming- Viot  model  for  population  genetics  and 
the  theory  of  SDE's  in  Banach  spaces  (see  [2, 9])  and  [11]).  Since  my  object  is 
to  give  a  basic  introduction  to  these  equations,  it  will  take  me  too  far  afield 
to  consider  applications  to  chemical  kinetics  or  to  include  recent  work  on 
interacting  neuronal  systems.  The  reader  is  referred  to  [10, 6]  and  [1]  for  an 
account  of  these  developments. 

In  what  follows  (particularly  in  Sections  2  and  3)  applications  to  neu¬ 
ronal  cell  behavior  will  be  kept  in  mind.  However,  the  SDE's  presented  here 
have  applications  to  other  areas  such  as  chemical  reaction  diffusions,  random 
strings  and  fluctuation  limit  theorems  of  interacting  particle  diffusions.  The 
important  problem  of  diffusion  approximation  is  introduced  in  Section  4. 

It  is  by  now  well-recognized  in  the  neurophysiological  literature  that  a 
neuron  cell  is  spatially  extended  and  hence  a  realistic  mathematical  descrip¬ 
tion  of  neuronal  activity  would  have  to  take  into  account  synaptic  inputs 
that  occur  randomly  in  time  as  well  as  randomly  at  different  locations  on 
the  neuron's  surface.  If  X  denotes  the  cell  membrane,  the  voltage  potential 
associated  with  the  neuron  cell  may  be  regarded  as  a  random  field,  indexed 
both  by  time  t  J  0  and  by  location  x  €  X.  The  importance  of  the  study  of 
the  fluctuation  of  the  voltage  potential  consists  in  the  fact  that  information  is 
transmitted  through  the  changing  amplitudes  of  the  electric  potential  across 
the  cell  membrane. 

Let  u(t,x)  represent  the  difference  of  the  voltage  potential  at  time  t  and 
site  X  from  the  resting  potential  of  about  -bOmV.  The  evolution  of  u  can  be 
ascribed  to  two  causes: 

1)  Diffusion  and  leaks  due  essentially  to  "deterministic"  causes.  In  the 
two  examples  of  the  next  section,  these  will  be  represented  by  a  PDE 
(suggested  by  core  conductor  theory)  with  Neumann  or  insulating 
boundary  conditions.  More  generally  (as  is  explained  in  Section  2)  it 
is  abstractly  described  by  a  Hilbert  space  L'^(X,  n)  and  a  semigroup 
defined  on  it  possessing  certain  properties. 

2)  Random  fluctuations:  When  a  burst  of  neurotransmitter  hits  some 
place  on  the  membrane,  the  potential  will  jump  up  or  down  by  a  ran¬ 
dom  amount  at  a  random  time  and  location.  It  is  reasonable  to  model 
this  randomness  by  a  Gaussian  space-time  process.  Alternatively,  since 
the  arrivals  of  the  impulses  at  distant  locations  or  in  disjoint  time  in¬ 
tervals  are  believed  to  be  approximately  independent,  they  may  be 
modeled  as  a  mixture  of  Poisson  processes  or  as  a  generalized  Pois¬ 
son  process. 
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These  questions  form  the  basis  of  the  linear  SDE's  of  the  next  section 
which,  in  turn,  lead  to  more  general  classes  of  nuclear  space-valued  SDE's. 


2.  Linear  SPDE’s 

It  is  convenient  to  begin  with  the  following  class  of  examples.  Consider  the 
cable  equation 

=-au-(-|3:^  t>0, 0<x<b  (2.1) 

ot  9x'‘ 

u(0,  x)  =  Uo(x)  (2.2) 


where  a,  |3  are  positive  constants  and  the  initial  value  is  a  smooth  function 
uo  on  [0,  b).  (2.1 )  is  the  PDE  that  governs  u(t,x),  the  value  of  the  membrane 
potential  (more  precisely,  the  difference  between  the  membrane  potential 
and  the  resting  potential)  in  the  absence  of  external  impulses.  One  way 
to  convert  (2.1)  into  a  model  that  permits  randomness  in  the  phenomenon 
being  investigated  is  formally,  to  add  Gaussian  white  noise  in  space  time 
to  get 


0U 


0'^U 

-au  +  (5^  4-W„ 


(2.3) 


The  new  term  Wt^  is  the  fictitious  derivative  of  the  space-time  Wiener  pro¬ 
cess  or  Brownian  sheet  Wn.  The  latter  (along  with  its  variants)  is  an  im¬ 
portant  and  basic  process  in  the  theory  of  SPDE's.  The  spatial  variable  x 

can  be  n-dimensional.  Wt  ^,.  (t  $  0,  Xj  €  fO.bl,  i  -  1 . n)  is  a 

family  of  random  variables  defined  on  a  probability  space  (Q.T,  P),  with  the 
following  properties: 


(i) :  For  every  (t,xi . Xn ),  Wi .x.,  is  a  Gaussian  random  variable 

with  mean  0  and  covariance 

(ii) :  E(W,,x,,  ,x..Ws,y,.  ,yj  =  (tAs)(xi  Ayi'  -  I’t'i  AUnl. 

(iii) :  For  each  w  or  P-a.e.  w  in  Q,  the  paths  (t,xi . x„ ) 

,x„  (lu)  are  continuous. 


It  is  well  known  that  as  a  function  of  (t,xi , . . . ,  Xy ),  the  function  in 
(iii)  is  almost  nowhere  differentiable.  Hence  (taking  n,  =  1),  (2.3)  cannot 
be  regarded  as  a  PDE  with  a  random  term  W,x  added  on.  A  rigorous 
reformulation  of  (2.3)  involves  the  extension  to  PDE's  of  Ito's  theory  of 
(ordinary)  SDE's.  Once  this  is  done,  the  rich  theory  of  stochastic  calculus 
is  available  as  a  powerful  apparatus  for  solving  (2.3).  (However,  the  fact 
that  we  are  dealing  with  multiparameter  processes  or  random  fields  and 
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partial  differential  operators  imposes  limitations  on  this  approach,  as  we 
shall  see  below.) 

Prior  to  rewriting  (2.3)  as  an  Ito  type  SPDE  let  us  note  that  the  Green 
function  of  the  Neumann  problem  for  ((2.) ),  (2.2))  is  given  by 

OO 

G(x,y:t)  =  22  e“^''‘4'n(x)(|)i,(y),(t  >  0)  (2.4) 

where 

Ao  =  a,An  =  a  +  |3(^  )^ 

0 

for  n  ^  1  and 


<|)o(x)  =  b 

4)n(x)  =  2''^'^b“’'''^cos  (n^l). 

D 


For  any  f  g  H  =  L^([0,  bl.dx)  if  we  define 


b 


(Ttf)(x)  = 


G(x,y;t)f(y)dy 


for  t  >  Oand  To  =  1, 


(2.5) 


Jo 

it  is  easily  verified  that  Tt  is  a  contraction  semigroup  on  H.  The  generator 
-L  has  dense  domain  and  agrees  with  -al  +  3^  on  smooth  functions.  L, 
of  course,  is  a  positive  operator.  (2.3)  can  be  rewritten  as 


du(x,t;a;)  =  — Lu(x,  t;  tu)dt  4-  dW,<(ui) 


u(0,x,u.')  =  uc(x). 


(2.6) 


The  above  is  an  example  of  the  Ornstein-Uhlenbeck  (OU)  SPDE.  What  we 
mean  by  a  solution  of  this  equation  will  be  made  clear  in  Section  3  where  a 
general  definition  is  given. 

By  substituting  the  expression  for  G  into  the  RHS  of 


u(t,  X,  to) 


■  I 

G(x,y;tluo(x)uo(x)  dx 
c 


h 


G(x,y;t  —  s)  dW,y(a)), 

0 


(2.7) 


it  can  be  seen  that  (2.7)  solves  (2.6).  The  L.H.S.  of  (2.7)  is  an  example  of 
a  Gaussian  random  field,  i.e.,  a  Gaussian  process  of  the  two  parameters 
t  and  X.  Various  regularity  properties  of  u  can  be  derived  (see  [14]).  In 
particular,  u(t,  x,  to)  is,  for  almost  all  tu,  a  continuous  function  of  (t,  x).  One 
may  hence  regard  u(t,  •,  cu)  as  a  process  depending  on  the  single  parameter 
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t  and  taking  values  in  the  Banach  space  C(0,  b).  It  is  possible,  but  not 
particularly  illuminating  to  look  upon  u(t,*)  as  the  solution  of  an  infinite 
dimensional  SDE. 

The  situation  changes  radically  when  we  go  from  one  space  dimension 
to  two.  The  following  example  is  due  to  J.B.  Walsh  [15].  Replace  (2.3)  by 

=  (Au-u)(t;x,y)  +  Wtxy  (2.8) 

with  initial  and  boundary  conditions 


0U  0U  0U,  9^3,  V 

^(t;0.y)  =  ^(t;n,y)  =  ^(t.x.O)  =  ^(t.x.n)  =0 


and  u(0;x,y)  =  0  for  all  x  and  y.  (Here,  we  have  taken  b  =  ti  for  conve¬ 
nience).  As  in  the  previous  problem,  we  have  a  contraction  semigroup 
Tt  on  L^([0,7tl^)  with  generator  — L  which  coincides  with  A  -  I  on  the 
smooth  functions  on  [O.n]^.  The  eigenfunctions  and  eigenvalues  of  L  are 
4>ik(x,y)  =  c{)j(x)(()i(y )  where  <t)j(x)  is  the  same  as  in  the  previous  example 
(with  b  =  tt)  and  Ajk  =  1  -f  +  k^.  The  method  for  obtaining  a  random  field 
solution  to  the  corresponding  SPDE  fails  because  ^  diverges. 

We  shall  return  to  this  example  later  in  this  section. 

Equations  of  the  kind  introduced  above  were  studied  in  [7]  as  SDE's 
governing  processes  taking  values  in  duals  of  nuclear  spaces.  The  spaces 
which  we  will  be  chiefly  concerned  with  will  be  denoted  by  O  and  O'  where 
O  is  a  countable  Hilbertian  nuclear  space  (CHNS)  and  O'  is  its  topological 
dual.  By  a  CHNS  we  mean  a  linear  vector  space  whose  topology  is  given  by 
an  increasing  family  of  Hilbertian  norms  or  seminorms  1|  •  ||r  such  that  if  Or 
denotes  the  ||  ■  ||r-completion  of  O,  then 


O  c  . . .  C  H,  C  He  =  He  c  H_i  c  . . .  C  O' 

and  O  =  (Hr^r-  Furthermore  r  ^  s  implies  Os  C  Or.  The  property  of 
nuclearity  says  that  for  every  integer  r,  there  exists  s  >  r  such  that  the 
injection  map  from  Os'— >  O,  is  Hilbert-Schmidt. 

A  typical  example  of  O  is  the  space  SIR**)  of  rapidly  decreasing  func¬ 
tions  on  R**.  The  dual  O'  then  consists  of  the  Schwarz  distributions.  In 
applications,  duals  of  nuclear  spaces  other  than  Schwarz  distributions  are 
sometimes  encountered.  While  most  SPDE's  or  infinite  dimensional  SDE's 
have  Gaussian  white  noise  as  {he  driving  term,  there  are  examples  such  as 
applications  to  fluctuations  of  the  voltage  potentials  of  cells  of  the  central 
nervous  system  where  it  seems  natural  to  assume  that  the  random  impulses 
arise  from  a  Poisson  random  measure.  This  is  the  motivation  for  the  defini¬ 
tion  of  a  O'-valued  martingale.  A  O'-valued  stochastic  process  Mt  (t  ^  0) 
is  a  martingale  relative  to  a  filtration  (Tt)  if  for  each  cp  €  O,  Mi[<pl  is  a 
real-valued  martingale. 
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For  convenience,  we  will  deal  here  only  with  square  integrable  martin¬ 
gales,  i.e.,  such  that  EMi  fcp]^  <  oo  for  each  <p.  The  reader  will  find  properties 
of  O'-valued  martingales  and  the  existence  of  a  cadlag  (=  right  continuous 
with  left  limits)  version  as  well  as  properties  of  stochastic  integrals  w.r.t. 
(Mt )  in  [12],  and  [6], 

An  important  example  of  a  cD'-valued  martingale  is  the  Wiener  process 
Wi  which  is  a  O'-valued  process  such  that  t  — >  Wi  is  continuous,  EVVtfcp]  =  0 
and  EWJcplW^frljI  =  (ty\  s)Q((p,v[))  where  cp.Ui  e  Oand  Q  is  a  bilinear  form 
continuous  on  O  x  <P.  Any  such  Wiener  process  actually  lives  in  C(R  i  ,  H_q ) 
for  some  q  >  0.  An  example  of  a  discontinuous  martingale  defined  by  a 
Poisson  random  measure  will  be  encountered  in  Section  3. 

Definition  2.1  (Omstein-Uhlenbeck  O'-SDE's).  The  SDE  is  given  by 


d£,i  =  A(t)'Etdt  +  dMi  (t  >  0) 

(2.9) 

f.0  =1 

(2.10) 

where 

(i) :  M  =  (Ml  ligR,  is  a  <D'-valued  martingale, 

(ii) :  r|  is  a  O'-valued  random  variable  independent  of  M  and 

(iii) :  A(t)  is  the  generator  of  a  Kolmogorov  type  evolution  semigroup 
on  O  defined  as  follows:  {S(s,t),0  $  s  $  t  <  oo)  is  a  semigroup  on 
O  satisfying  the  conditions  given  below  for  each  T  >  0. 

(a) :  There  exists  qo  $  0  such  that  for  all  q  ^  qo  ||  S(s,t)(p  1),,$ 

II  ip  II (ip  £  (D.aq.Oq  positive  constants  which, 
together  with  qo  may  depend  on  T). 

(b) :  |fS(s,t)(p  =  S(s,t)A(t)(p 

(c) :  Ji,S(s,t)<p  =  -A(s)S(s,t)MJ  (0  ^  s  $  t  $  T). 

A'(t)  is  the  adjoint  of  A(t)  defined  by  (A'(t)u)[(pl  = 
u[A(t)(p|. 


Theorem  2.2.  [6]  Let  Mo  =  Oa.s.  and  EMt[<p]^  <  oo.  Further,  let  E||r]||i^  < 
oo  for  some  r  >  0.  Then  for  each  T  >  0,  there  exists  pi  >  0  such  that 
(Ei)o<:i^T  €  D([0,T1,H_,,, )  a.s.  and 


LiM 


OO 

n[S(0,t)(pl  +  y 

i  iJt' 


(S(s,t)<p.(pj)p,dM[<pj] 


(2.11) 


is  the  unique  solution  of  (2.9)-(2.10).  Here  ((pj)  is  a  CONS  in  Hp, . 
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Nuclear  spaces  0,0'  occur  in  connection  with  a  rigged  Hilbert  space. 
We  are  given  a  separable  Hilbert  space  H,  a  CHNS  O  and  its  dual  O'  s.t. 

O^-)  O' 

where  the  injection  maps  are  continuous.  H  itself  may  or  may  not  coincide 
with  He. 

In  many  practical  problems  (including  the  examples  at  the  beginning 
of  this  section)  one  is  given  a  Hilbert  space  (usually  an  L^-space)  and  a 
continuous  semigroup  Tt  on  it  which  is  naturally  specified  in  the  problem. 
The  space  O  is  not  given  in  advance.  For  a  wide  class  of  problems,  the 
generator  — L  has  the  Hilbert-Schmidt  property  that  (1  +  is  a  Hiibert- 
Schmidt  operator  on  H  for  some  T)  >0.  A  CHNS  space  <5,  appropriate  to  the 
problem  can  be  constructed  in  the  following  manner.  Let  (cpj)  and  (Aj)  be 
the  eigenfunctions  and  eigenvalues  respectively,  of  L.  The  Hilbert-Schmidt 
condition  implies  that  ((pj )  is  a  CONS  in  H.  Define 

<oo  Vr^oj  (2.I2I 

Let  Hr  be  the  completion  of  <P  under  the  Hilbertian  norm  ))  (p  ||r  =  XT  1 1 ' 

Aj )^''((p,  cpi)?,.  Then  (D  =  P|Hr  and  Ho  =  H.  For  any  r  ^  0,  the  injection 
map  Hs'— >  Hr  is  Hilbert-Schmidt  for  s  >  r  -t-  r|.  We  shall  refer  to  (H,  T,. 
as  a  compatible  family. 

Suppose  that  W,  is  a  O'-valued  Wiener  process  with  covariance  kernel 
Q  and  such  that  W,  e  C(R  i ,  H_q  ).  The  next  result  is  a  useful  special  case  of 
Theorem  2.2. 

Theorem  2.3.  Consider  the  Ornstein-Uhlenbeck  SDE 

dL,  = -L'Ltdt -I- dWt,  (2.13) 

E  II  r|  |lirr<  oo  for  some  r2  >  0,  £,o  =  1  where  t]  is  a  Gaussian  random 
variable  independent  of  W.  Then  (2.13)  has  a  unique  solution  L  -  (Li)ttR. 
such  that 

£,  e  C(R I , H-.,, )  for  p  $  max(q  +  Ti.r^)  (2.14) 

and  is  a  Gaussian  process; 

OO 

Lt=^E;<Pi.  (2.15) 

i  1 

the  series  converging  uniformly  a.s.  in  0  $  t  ^  T  in  the  H_p -topology. 
The  coefficient  processes  £,[  are  real-valued  Ornstein-Uhlenbeck  processes 
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satisfying  d£,J  = -Aj£,[dt  +  W’,£,?  =  r|j  where  W,'  =Wt[<Pjland  £,[  =£,t|(Pjl. 
The  coefficient  processes  are  independent  iff  Q(<pi,  (pj)  =  6ij. 

We  shall  now  apply  this  theorem  to  complete  the  solution  to  Walsh's 
example.  (In  [15],  the  existence  of  a  distribution  value  solution  is  discussed). 
Write  D  =  [0,7t]^  and  H  =  L^(D)  with  inner  product  denoted  by  (  ,  )o.  With 
appropriate  notational  changes  in  our  definition, 

II  V  llr=  ^(1  +Aik)^''(<P,(Pik)o 

),k 

and 


a)  =  {<peL^(D):||<p||r<ooVr^01. 

O  is  nuclear  and  can  be  identified  with  the  space  DID)'  of  Schwarz 
distributions.  The  generator  L  satisfies  the  Hilbert-Schmidt  condition  with 
r,  >  (the  smallest  integer  rj,  therefore  is  0).  For  cp  €  <I>,  let  VVtIcp)  = 
J  Ju  <p(x,y)dxy  Wtxy.  Then  VVt(<p),(p  e  O  is  a  Gaussian  system  of  random 
variables  with  EVVt(<p)  =  Oand  EVVt(<p)Ws(\l))  =  (t/\s)((p,tj>)c.  Wt  is  thus 
not  an  L^(D)-valued  random  variable  but  a  cylindrical  Brownian  motion 
(c.B.m.)  on  L^(D).  Now,  EVVi((p)^  =  t  ||  (p  ||p$  t  |i  (p  and  the  injection 
map  from  Ho  is  Hilbert-Schmidt.  The  latter  follows  from  the  fact 

that  (p|^'  =(1  -f-Ajk)“'(Pik  forms  a  CONS  in  H|  and 

L II  iio=  Li'  +  =  L*'  + 

j,k  i,k  i,k 

It  then  follows  that  there  exists  a  <h'- valued  Wiener  process  W,  which  actually 
lives  in  H  .i  such  that 

EW,f(plWs(\|)l  =  (t/\  s)((p.4))o. 

We  can  now  apply  Theorem  2.3  to  our  example  noting  that  ri  =  0,  q  1 
and  T2  =  0  since  q  =  0.  Thus  the  SDE  (2.13)  which  is  a  rigorous  form  of  the 
SPDE  (2.8)  has  a  unique  Gaussian  solution  E.  t  C(Rt,H_)).  In  addition, 
the  Ornstein-Uhlenbeck  coefficient  processes  Ej  in  (2.15)  are  independent. 

3.  General  O'-valued  SDE's 

Many  physical  problems  often  involve  nonlinearities  which  require  more 
realistic  stochastic  models  than  the  linear  SDE's  considered  so  far.  The 
Hodgkin-Huxley  theory  of  neuronal  behavior  is  a  good  example.  Even  a 
relatively  simple  stochastic  description  of  the  theory  has  to  take  into  ac¬ 
count  some  important  nonlinear  features  such  as  the  reversal  (or  equilib¬ 
rium)  potentials. 
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As  an  illustration,  let  us  consider  a  "point"  neuron,  where  the  spatial 
extension  of  the  neuron  is  ignored.  The  reversal  potentials  are  simply  two 
nonrandom  constants,  Ve  >  0  and  Vj  <  0  wh.ch  are  introduced  to  control 
the  behavior  of  the  potential  Vt  (strictly  speaking,  Vt  is  the  difference  of  the 
potential  from  some  resting  value),  i.e.,  to  prevent  it  from  assuming  two  high 
a  positive  or  negative  value. 

If  Y  >  0  denotes  the  rate  at  which  the  potential  decays  while  in  a 
quiescent  state  and  if  Oc  and  a,  are  the  magnitudes  of  the  excitatory  and 
inhibitory  impulses  arriving  in  independent  Poisson  processes  N^.  and  Nj, 
then  Vi  satisfies  the  SDE 


dv,  =  -yVidt  +  (Ve  -  V,)QedNe(t)  +  (Vi  -  V,)QidNi(t)  (3.1 ) 

(Vo  =  given  initial  value).  (3.2) 


It  has  been  shown  that  if  Oc  and  Qi  are  small  (which  is  the  case  in 
practice),  under  suitable  conditions  involving  y,  Ue,  a,,  fc  and  fi  (the  last 
two  quantities  denoting  the  respective  intensities  of  Ne  and  Ni),  the  above 
SDE  approximates  a  diffusion  SDE.  More  precisely,  if  the  solution  of  the  n"' 
approximating  equation  is  given  by  V",  then  for  some  sequence  of  constants 
Cn,  6n,  the  discontinuous  process  X;’  =  Cn(V,”  -6,,)  converges  in  distribution 
to  the  solution  Xt  of  the  diffusion  equation 


dX,  = -(3X,dt.  +  cT(X,)dW,.  (3.3) 

In  (3.3),  (3  >  0  is  a  constant,  W  is  the  standard,  one-dimensional  Wiener 
process  and  the  diffusion  coefficient  u  is  given  by  a'^(x)  =  ac  t  » i  x  i  «2X^ 
where  the  a's  are  certain  real  constants,  with  «2  >  0.  The  presence  of  the 
nonconstant  diffusion  coefficient  makes  X,  a  non-Gaussian  process  and  (3.3) 
a  nonlinear  SDE.  For  more  details  see  [8J  and  the  references  therein. 

The  above  discussion  forms  the  motivation  for  the  study  of  Q'-valued 
diffusion  SDE's  and  SDE's  that  generalize  (3.1 ).  The  study  ot  these  equations, 
of  course,  has  a  much  wider  purpose.  For  both  types  of  SDE's  we  make  the 
basic  assumption  on  <D  that  there  exists  (hn )  €  O  such  that  (h,, )  is  a  CONS 
in  Ho  and  is  a  COS  in  each 

Definition  3.1  (Poisson  martingale  driven  SDE).  Let  (Ll,U,u)  be  a  a- 
finite  measure  space  such  that  L^(U,n)  is  separable.  Let  N(duds)  be  a 
Poisson  random  measure  on  x  U  with  intensity  measure  u(du)ds  and  its 
compensator  N(duds)  =  N(du  d)  -  n(dti)ds.  The  O'-valued  SDE  is  given 
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by 


dX,  =  A(t,Xt)dt  4- 


G(t,Xt_,u)N(dudt), 

u 


Xo  =0. 


(3.4) 


The  coefficient  processes  A  and  G  satisfy  the  conditions  given  below: 

■  A  :  R ,  X  O'  ->  O', 

•  B  :  R,  X  O'  -t  L(0',0'),  the  family  of  continuous  linear  operators 
from  O'  to  itself. 

■  For  each!  >  0,  there  exists  po  =p.^(T)t  N'  such  that  for  every  p  J  po 
there  is  a  q  5  p  and  a  constant  K  =  K(p,  q)  such  that  for  t  €  fO.  T'  and 
u  €  U, 


A(t,  )  :  H_,,  -4  H_<,,  G(t.-,u)  :  H_p 
and  for  every  <p  € 


H. 


A(t 


.  LI 


(G(t,v,u)f(p|(^|4(d|4) 


are  continuous  in  v  t  H..,,. 
(Coercivity)  For  4)  t  0(.'  H-,, ) 


G(t,ip.u)  iltp  M(du) 


2A(t,  LpIfLfjl  + 

K(l+  II  4^  iii,,); 

(Growth)  For  v  e  H-p, 

II  A(t,v)  +  (  II  G(t,v,u)  Ilip  u(dii) 
Ju 


5.  K(l4  II  vilij. 


(3.3) 


(3.P) 


(3.7) 


Definition  3.2  (Solution).  The  notion  of  solution  of  an  SPDF.  or  a  O'- 
valued  SDE  has  to  be  clarified.  For  both  the  SDE  (3.4)  and  the  diffusion 
equation  to  be  given  below  as  well  as  for  all  SDE's  we  have  two  distinct 
definitions  of  solution.  We  give  below  the  definitions  for  (3.4). 

Definition  3.3  (Strong  solution).  Let  a  probability  basis  (Q,  (T,  ),P)(t 
0)  satisfying  the  usual  conditions  be  given  and  let  N  be  a  Poisson  random 
measure  adapted  to  the  filtration  {7|  )and  with  intensity  measure  ii(du)ds.  A 
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(Jtj-adapted  process  (Xt)o^isT  €  D{[0,T],<D')  (Skorohod  space)  is  a  strong 
solution  of  (3.4)  if  for  each  (p  €  O, 

rt  ct  [■ 


Xi[(pl  + 


A(s,Xs)[(plds  + 


G(s,Xs-,u)[(p]N(duds). 


(3.8) 


Jo  JoJu 

We  also  assume  that  the  (D'-valued  random  variable  Xo  satisfies 

h  II  Xo  ||i.p<  oo  for  some  p.  (3.9) 

(Xt )  is  said  to  be  the  unique  strong  solution  if  the  following  is  true:  If 
(Yf )  is  another  strong  solution,  then 


P(X,  =Yt  vte  fO,T])  =  1. 


For  most  statistical  purposes,  the  distributional  properties  of  a  solution 
are  more  relevant  than  the  requirement  that  it  be  produced  as  a  random 
function  on  an  arbitrarily  chosen  probability  space.  This  leads  to  the  concept 
of  a  weak  solution.  A  weak  solution  of  an  SPDE  (or  an  infinite  dimensional 
SDE)  should  not  be  confused  with  a  weak  or  generalized  solution  of  a  PDE, 

Definition  3.4  (Weak  solution).  By  a  weak  solution  of  (3.4)  we  mean  a 
O'-valued  process  (Xt)  defined  on  some  probability  space  (0,.T, P)  with  a 
reference  filtration  (Tili^o  such  that 

1)  there  exists  an  ( Jt)-adapted  Poisson  random  measure  N  with  intensity 
measure  u(du)ds, 

2)  (Xi)o§t§T  €  D([0,  Tl,<I)')and  themap  to  X(t,  to)  is  measurable  with 
respect  to  the  natural  a-fields  provided  in  the  problem; 

3)  (3.8)  and  (3.9)  are  satisfied. 


Our  aim  in  studying  these  nonlinear  SDE's  has  been  to  concentrate  on 
the  weak  solution.  We  have  been  able  to  obtain  the  existence  of  a  weak 
solution  under  conditions  which  do  not  include  the  so-called  monotonicity 
condition  stated  below.  For  t  p  (O.T), vt,v2  €  H-,„ 

2(A(t,vi  )-A(t,V2),vi +  [  II  G(t,vi,u)-G(t,V2,u)  ||i  u(du) 

Ju 

^K||V,  -V2  111,. 

However,  we  can  prove  uniqueness  only  by  assuming  the  monotonicity 
condition,  in  which  case,  the  solution  turns  out  also  to  be  strong.  The  problem 
of  proving  uniqueness  of  weak  solution  without  assuming  the  monotonicity 
condition  is  open.  We  will  not  discuss  these  questions  further  in  this  article. 

Our  main  result  concerning  (3.4)  is 
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Theorem  3.5.  Under  conditions  {3.5)-(3.7)  and  (3.9),  the  equation  (3.4)  has 
a  weak  solution.  Furthermore,  there  exists  an  integer  p  (possibly  depending 
on  T)such  that  (Xt)(i$t$T  €  D([0,T],H_p)  a.s. 

The  proof  of  Theorem  3.5  and  the  existence  of  a  weak  solution  in 
D(3l  t ,  O')  for  all  t  >  0  as  well  as  other  properties  of  these  SDE's  are  given 
in  a  forthcoming  paper  [3]. 


Remark  3.6.  In  the  SDE's  (3.4)  and  (3.10)  below  it  is  tacitly  assumed  that  the 
stochastic  integrals 


OJ 


G(s,Xs_,u)N{duds)  and 


B(s,XJdWs 


are  defined  and  belong  to  O'.  A  demonstration  of  this  fact  (as  well  as  many 
other  details  in  this  exposition)  would  take  up  far  too  much  space  and  is 
therefore  omitted. 


Definition  3.7  (Diffusion  equation).  The  diffusion  equation  in  O'  is  of 
the  form 

dX(t)  =A(t,Xt)dt  +  B(t,X,)dW,,(t  >0) 


Xo  =£., 


(3.10) 


a  O'-valued  random  variable  independent  of  (W, ),  a  Wiener  process  which 
is  a  martingale  with  respect  to  the  filtration  (Jt)  and  has  Q  as  its  covariance 
kernel.  We  will  not  repeat  here  the  definitions  of  weak  and  strong  solution 
which,  with  appropriate  changes,  are  the  same  as  the  ones  given  earlier  for 
the  discontinuous  SDE. 

The  theory  of  equation  (3.10)  on  the  existence  and  uniqueness  of  solu¬ 
tion  has  been  worked  out  in  detail  in  [5]. 

The  drift  coefficient  A  maps  3?  ^  x  (&'  into  O'  and  the  diffusion  coefficient 
B  (which  is  the  new  feature  here)  is  operator-valued,  i.e., 

B;3?,  X  O'  -»  L(0',0') 

where  L  is  the  space  of  continuous  linear  mappings  from  O'  to  O'.  Let  q  >  0 
be  such  that  W  £  C(3J)  ,H_q).  Then  clearly  Wt  €  H_r  for  any  r  ^  q.  In 
stating  the  conditions  let  us  fix  r  ^  q.  The  quadratic  form  Q,  defined  on 
0x0  then  has  a  continuous  extension  to  a  nuclear  form  on  Hr  x  Hr.  The 
finite  quantity 

|Q|_r,-r  =  ^Qfh^hn 
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is  the  trace  norm  of  Q,  (h[)  being  a  CONS  in  Hr-  For  any  A  e  L(<D',<I)') 
let  A*  €  1(0,0)  be  the  usual  adjoint  of  A.  Denote  the  quadratic  form 
Q[A’(p,  A*t|;l  by 

The  coercivity,  growth  and  monotonicity  conditions  now  take  the  fol¬ 
lowing  form.  For  each  T  >  0,  and  sufficiently  large  m  ^  r,  there  exists  a 
constant  0  >  0  and  an  index  p  ^  m  such  that  for  t  e  [0,  T), 

(Coercivity)  2A(t,  (p)[<pl -f  |QB(t,<p)l-m,-m  ^  0(H-  ||  O  ^  ®(C 

H-n,); 

(Growth)  If  u  €  H-m  then  A(t,u)  c  H_p  and 

II  A(t,u)  Hip  +|QB.(u)l-m,-m  $  9(1+  ||  U  ||i,,) 

(Monotonicity)  Foru,v  €  H_m(c  H_p) 

(A(t,u)-A(t,v),u-v)_p  +  IQb,(u)-b,(v)I-p,-p  $  S  II  u-v  ||ip  . 

In  addition  to  the  joint  continuity  of  A  and  B  it  is  further  assumed  that 
Bt(u)v  6  H_,n  if  u,v  e  H_m  and  Q[BJ(u)(p,  BJ(u)(pl  is  continuous  in  u  €  O' 
for  each  (p  e  O.  The  condition  on  the  initial  variable  Xo  takes  the  form 
E  II  Xo  Iliif  <  oo  for  6  >  0. 

Theorem  3.8.  Under  the  above  conditions  and  a  slightly  weaker  initial 
condition,  it  is  shown  in  [5]  that  the  SDE  has  a  unique  strong  solution  (Xt ) 
which  has  the  further  property  that  (Xi  )o$t5i  €  C((0,  T],  H_p). 

Remark  3.9.  In  the  case  of  both  equations  (3.4)  and  (3.10)  we  have  ob¬ 
tained  solutions  over  a  finite  interval  (0,  T]  and  in  this  case,  the  paths  lie 
in  D([0,Tl,H_p)  and  C([0,Tl,H_p)  respectively.  H_p  is  a  much  smaller 
space  than  <D'.  As  we  increase  the  time  interval,  the  indices  p  increase  to  oo 
in  general. 

If  we  are  interested  in  a  solution  to  (3.10)  for  all  t  ^  0,  then  using 
Theorem  3.8  with  T  =  1,2,...  and  "pasting"  together  the  solutions  on 
[0,n)(n  =  1 ,2, . . .)  we  obtain  the  unique  solution  X  =  (Xt  )oc i<oc  to  (3.10) 
which  has  the  property  X  6  C(R(  ,<D')  a  s.  It  is  not  true  in  general,  that 
there  is  an  integer  p  such  that  X  €  C(R|  ,H_p)  a.s.  An  example,  due  to 
S.  Ramaswamy  and  me  is  given  in  [4]. 

We  might  also  mention  here  a  technical  advantage  in  working  with  the 
dual  of  a  CHNS  <I>  rather  than  with  a  Hilbert  or  Banach  space  specified  in 
advance  in  which  the  solution  has  to  live.  It  is  that  O'  belongs  to  a  class 
of  infinite  dimensional  linear  topological  spaces  which  have  the  property 
(possessed  by  finite  dimensional  Euclidean  spaces)  that  bounded,  closed 
subsets  are  compact.  A  paper  of  Mitoma's  gives  criteria  for  tightness  of 
sequences  of  probability  measures  on  O'  that  are  easier  to  use  than  tightness 
conditions  for  Hilbert  and  Banach  spaces  [13]. 
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4.  Diffusion  approximation  [3] 

Let 

Xr=Xo  +  f  A''(s,X^)ds+ [  [  G''(s,xp_,u)N(duds)  (4.1) 

Jo  JoJu 

and  let 

supE  II  Xo  ||ip<  oo  for  some  p  (4.2) 

n  3: 1 

Assume  the  same  condition  on  (4.1 )  as  in  the  previous  section  with  the 
proviso  that  the  constants  appearing  in  the  coercivity,  growth  and  mono¬ 
tonicity  conditions  do  not  depend  on  n. 

Proposition  4.1.  Let  {X^’} be  tight  in  H_p.  Then  !X"}is  tight  in  D(fO,  T),  H_p). 
Let  Q  be  a  continuous,  quadratic  form  on  O  and  let  A  :  R.,  x  O'  O', 


B  :  Rt  X  O'  — »  L(0',0')  satisfy  the  following  conditions: 

For  each  (p  6  O, 

as  n  ->  oo 

X?  =^Xo 

(4.3) 

A"(s,v)f(pl  -»  A(s,v)[(pl, 

(4.4.1) 

and 

G"(s,v,u)f(pl^n''(du)  -+  Q(Bs(v)'(p,B;(v)(p1 

Ju 

(4.4.2) 

IG"(s,  v,u)[<pll’n'’(du)  -4  0, 

.  u 

(4.4.3) 

uniformly  in  (s,v)  €  [0,T)  x  where  A'^  =  (v  €  H_p  :)|  v  ||-p$  M}. 


supE  |]  X^  ||ip<  oo.  (4.5) 

n 

Under  these  conditions  we  have  the  following  result. 

Proposition  4.2.  If  P'  isa  cluster  point  of  P",  the  probability  measure  induced 
by  X"  of  Proposition  4.1  on  D((0,  T|,  H_p),  then 

P*(C(fO,Tl,H_p))  =  1.  (4.6) 


We  now  wish  to  relate  P*  to  the  diffusion  equation 

dX,  =  A(t,X,)dt-l-B(t.XUdWt  (4.7) 

with  initial  condition  Xo  independent  of  W .  Let  ©^“(O')  =  [F  :  O'  ->  R, 
F(v)  =  h(vf<{j])  for  h  €  Cg°(R),  and  cp  €  O.  For  F  €  D^IO')  define 

L,F(v)  =  A(s,v)[(plh'(v[(pl)  4-  l/2h‘(v[<p))Q[B*(v)(p,B’(v)(pl 
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and  let 

(Z]t  -FlZt)  -F(Zo)  -  f  £sF(Zs)ds. 

Jo 

Denoting  by  (!Bt,  ),  the  canonical  filtration  in  D([0,Tl,H_p),  we  define  a 
probability  measure  P  on  (C(fO,T],H_p),  Pj C((0, Tj, H_p))  as  a  solu¬ 
tion  of  the  L-martingale  problem  if  for  each  F  e  1)^(0'),  \P(2)i  is  a 
P-martingale. 

Our  next  proposition  is: 

Proposition  4.3.  P*  is  a  solution  of  the  £-martingale  problem. 

To  get  our  final  result,  we  assume  the  monotonicity  condition  for  A 
and  B; 

For  all 

v,v' e  H_p,  t  e  [O.TI.  (4.8) 

forsomeq  ^  p  (At(v)-At(v').v-v')_q -t-|QB,(vi-B,(v')l-q,-q  ^  K|v-v'|iy. 

Theorem  4.4  (Diffusion  approximation).  Under  the  conditions  of  the 
preceding  section  on  XJ,  A"  and  G"  and  conditions  (4.1 )  (4.8),  we  have  that 
P''  converges  weakly  to  P*  where  P*  is  the  probability  law  of  the  unique 
strong  solution  of  the  diffusion  SDE 

dX.  =  A(t,X,)dt B(t,X,)dW,  (4.9) 

with  Xo  independent  of  W  where  (Wi)  is  a  O'-valued  Wiener  process  with 
EWt[cp]VV,J4)l  =  (t  As)Q((p,\l)). 

Remark  4.5.  Actually  it  can  be  shown  that  (X,  )o< lies  in  C((0,  T),  H_r1 
a.s.  where  the  index  r  will,  in  general,  depend  on  T. 


4.1.  Application 

We  will  give  just  one  application  of  Theorem  4.4  to  obtain  the  approximation 
of  the  Poisson  Ornstein-UhlenbeckS.D.E.  to  the  usual  (i.e.,  Wiener)  Ornstein- 
Uhlenbeck  SDE.  (see  [7]).  As  before,  let  -L  be  the  generator  of  the  semigroup 
Tt  on  H  =  L^(X)  (where  (H.Ti.tP)  is  a  compatible  family).  The  Poisson 
Ornstein-Uhlenbeck  SDE's  are  given  by 

£,r  =  i^-  ['L'f.rds+xr 

Jo 
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X{^[(p]  =  tTn"[(p]  + 

JXxX 

=  tm"f(p]  + 


f 


a(p(x){N’'(dQdxds)  -  |i'^(dQ  dx)  ds} 


Then 


(M'"[<p!)t  =  tQ''{(p,(p) 
where 

Q''(<p,<p)  =  [  a^(p(x)^n"(dadx). 

JjixX 

Suppose  the  following  conditions  are  satisfied. 

«  Tu"  €  O'  and  Q"  is  continuous  on  O'  x  O'; 

.  sup„  E  II  £.J  llir:  <  oo  and  Lo  ^  Lo 

•  m"[(pl  -»  Tn[(pl, 

•  Q"[<p,iJ)l  converges  to  a  continuous  limit  Q[(p,ihlr 
■  J:i;^xa:l‘^0(x)|-V''(dadx)  ->  0. 

Then  the  conditions  of  Theorem  4.4  hold  and  E"  converges  weakly  to 
L,  the  strong  solution  of  dEt  =  -L'Etdt  +  dW,,  (Eo  initial  value  independent 
of  W)  where  Wis  a  H_,, -valued  Wiener  process  with  covariance  kernel  Q. 
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S 

S  Outlined  is  a  bi-measure  theoretic  framework,  including  the  fundamental 
Grothendieck  inequality  and  factorization  theorem  of  which  self-contained 
proofs  are  presented.  Stochastic  integrators  have  a  natural  description  in 
this  framework. 


1.  The  problem  of  stochastic  integration— -a  (biased)  overview 


Let  O  =  (D,lt,P)  be  a  probability  space  and  let 
X  =  {X^(t):a.€  (0,U.P),t€  fO.l]} 


be  a  stochastic  process.  We  often  encounter  processes  almost  all  of  whose 
sample  paths  are  of  unbounded  variation.  This  property,  effectively  convey¬ 
ing  the  non-deterministic  nature  of  a  stochastic  model,  naturally  brings  up 
the  issue  of  stochastic  integration.  The  basic  question  is  this:  given  a  real¬ 
valued  process  X  almost  all  (P)  of  whose  sample  paths  X(t)|t  6  [0, 1])  are 
of  unbounded  variation,  how  can  we  make  sense  of  an  integral  (a  random 
variable  on  (n,U,P)) 


1 


Fu,(t)dX^(t) 


F(t)dX(t) 


(1.1) 


where  Fu,(t)  (tu  €  Q.t  e  fO,  1))  is  a  "random"  function,  and  integration  is 
performed  over  [0, 11? 

The  obvious  difficulty  is  that  for  almost  all  lu.Fu.ft)  cannot  be  inte¬ 
grated  against  dXti,(t)  in  the  ordinary  Riemann-Stieltjes  sense.  To  negotiate 
this  obstacle,  one  usually  follows  a  functional  analytic  approach.  We  start 
with  a  simple  function  on  Q  x  fO,  11  (a  simple  process) 

Fcult)  =  Ai  e  11,0  $  Si  $  ti  $  1,  (1.2) 

i 
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1.3) 


and  define  its  integral  with  respect  to  X  by 
f ' 

FdX  =;  V  QilA,(X(vO-X(si)). 

Jo 

In  order  to  obtain  a  larger  class  of  integrands  and  a  corresponding  extension 
of  (1.3),  we 

1)  make  some  (reasonable)  assumptions  about  X,  and  then 

2)  propose  an  appropriate  metric  on  simple  processes,  convergence  in 
which  will  imply  convergence,  in  some  sense,  of  the  respective  random 
variables  in  (1.3). 

N.  Wiener  was  first  to  integrate  deterministic  L-^-functions  with  respect 
to  Brownian  motion  [21, 17].  K.  Ito,  a  pioneer  in  the  area  of  stochastic  calcu¬ 
lus,  replaced  the  deterministic  integrands  in  Wiener's  integral  with  random 
funcbons  (e.g.,  [9, 10]).  ltd's  construct,  recast  in  a  framework  of  martingales 
[6],  has  evolved  into  a  stochastic  integral  where  X  is  a  semimartingale  (=: 
martingale  +  a  finite  variation  process)  and  the  integrand  is  predictable  (see 
[5,12,15,19]). 

To  motivate  our  point  of  view,  we  note  that  the  ItC  integral  is  based  on  a 
crucial  L^-isometry  between  simple  predictable  processes  (in  (1.2)  above.  A; 
is  X(si  (-measurable)  and  their  respective  stochastic  integrals  (e.g.,  [5],  The¬ 
orem  2.3).  In  this  sense,  the  summation  method  underlying  the  Ito  integral 
is  analogous  to  conditional  summability  of  series.  Integrals  involving  pre¬ 
dictable  processes  and  semimartingales  are  thus  stochastic  Riemann-Stieltjes 
integrals,  inextricably  linked  to  the  idea  of  filtration.  A  hypothesis  of  increas¬ 
ing  CT-fields  is  certainly  natural  when  a  stochastic  model  is  indexed  by  time. 
In  a  general  format,  however,  "filtrations"  do  not  always  play  an  obvious 
role — e  g.,  processes  indexed  by  spatial  parameters.  In  the  specific  context 
of  stochastic  integration,  consider  for  example  any  L^-bounded  process  with 
orthogonal  increments;  it  need  not  be  a  semimartingale,  but  still  qualifies  as 
an  integrator  (see  Example  5.2  in  Section  5). 

In  this  paper,  which  focuses  on  Item  1  above,  stochastic  integration  is 
viewed  on  a  primal  level:  a  process  X  will  be  an  integrator  if  bounded  deter¬ 
ministic  functions  are  integrable  with  respect  to  X  (the  notion  of  integrability 
will  be  made  precise  in  due  course).  The  idea  of  an  integrator  generalizes  the 
role  assigned  to  Brownian  motion  in  the  Wiener  integral.  Since  deterministic 
functions  are  adapted  to  every  filtration,  a  semimartingale  [19]  will  a /orf ion 
be  an  integrator  in  our  sense.  However,  filtrations  and  related  analysis  of 
"time"  do  not  have  preassigned  roles  in  our  setting.  Indeed,  their  absence 
at  the  outset  facilitates  constructions  of  Lebesgue-Stieltjes  stochastic  integrals 
(Item  2  above),  which  will  be  described  in  a  later  paper. 

This  article,  which  is  an  overview  of  three  previous  papers  [2, 3, 4],  out¬ 
lines  a  format  of  bi-measure  theory,  including  the  fundamental  Grothendieck 
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inequality  and  factorization  theorem  (Theorems  3.1  and  3.5  in  Section  3). 
Stochastic  integrators  are  then  naturally  viewed  in  this  framework.  Exam¬ 
ples  are  described  in  the  last  section  of  the  paper. 

The  next  two  questions  of  interest — how  to  deal  with  integrators  in  sev¬ 
eral  parameters  and  how  to  deal  with  random  integrands — will  be  discussed 
in  subsequent  work. 


2.  The  framework 


Let  (X,yi)  and  (y,®)  be  measurable  spaces.  A  ff-valued  function  n  on 
measurable  rectangles  in  X  x  y  is  a  bimeasure  if  u  is  ^-measure  separately 
in  each  coordinate,  i.e.,  for  each  E  €  A  and  F  ^  n(E  x  •)  and  n(-  x  F)  are  C- 
measures  on  (y,B)  and  (X,A),  respectively.  The  term  bimeasure  originates 
in  the  works  of  Morse  and  Transue  (e.g.,  [16]);  in  [1  ]  we  refer  to  these  and  more 
general  set  functions  as  Frechet  pseudomeasures.  We  proceed  to  catalogue 
below,  without  proofs,  some  basic  facts  regarding  bimeasures. 


1) 


The  Frechet  variation  of  a  bimeasure 


=  sup 


M(Ei  V  Fibi; 

t.i  T 


)x  is  defined  as 

' )  •  I  *- 1  n  '  1 '  )  j ) 

'  oc 


measurable  partitions  of  X  and  y. 


respectively,  N  >  0 


(2.1) 


In  (2.1),  {rjJigoi  denotes  the  usual  system  of  Rademacher  functions 
on  [0,11,  and  ti  i-rjls.t)  =  ri(s)rj(t), (s.t)  €  [0,  Il‘,(i,j)  t  OV.  The 
Rademacher  system  here  is  merely  a  convenient  device  generating 
arbitrary  choices  of  signs:  given  any  Ci  ==  ±1,i  €  XI,  there  is  t  €  [0, 1] 
so  that  ri(t)  =  £i,v  €  XI.  The  starting  point  of  multi-measure  theory 
is  the  assertion:  if  g  is  a  bimeasure,  then  ||g||  is  finite.  This  statement, 
verified  by  applying  the  machinery  of  vector  valued  measures  (e.g., 
[7],  IV.IO),  is  of  course  the  two-dimensional  extension  of  the  statement 
that  the  total  variation  of  every  €-measure  is  finite. 

2)  Let  f  and  g  be  bounded  measurable  functions  on  X  and  y,  respectively, 
and  define  gf(F)  =  f(x)g(dx  >.  F),and  gg(E)  =  g(y  )g(E  x  dy ),  E  t 
A,  f  £  3.  We  deduce  that  gf  and  gg  are  €-measures  on  y  and  X, 
respectively,  and  then  define  the  integral  of  f  :  g  with  respect  to  g  by 


f  gdg  = 


g(y)gf(dy)=f  f(x)gg(dx). 
Jx 


We  have  the  following  estimates: 

lldfll  ^  IlflMIdll.  Ildgll  $  llgllo 


Xx^ 


f  C-)  gdg 


IKIIoollglloollnll. 


(2.2) 

(2.3) 


3)  The  estimates  in  (2.3)  easily  imply  that  for  all  e  >  0  and  all  bounded 
measurable  functions  f  and  g  on  3C  and  y,  respectively,  there  are  simple 
functions  (p  and  \1)  on  OC  and  y,  respectively,  so  that 


Xx!< 


f  :■)  gdp  ■ 


(p  0\l)dp. 


<  £. 


(2.4) 


3.  Some  basic  tools  (with  proofs) 


Theorem  3.1  (a  version  of  Grothendieck's  fundamental  inequality  [8]). 

Let  X  and  y  be  measurable  spaces  and  let  p  be  a  bimeasure  on  X  x  y.  Let 
[fklkt^n  3nd  (gkjkeoi  be  sequences  of  bounded  measurable  functions  on  X 
and  y,  respectively,  satisfying 


k€0I  I 


<  I, 


Then 


Y_  l9kl^ 

keoi 


<  1. 


L 


fk 

X  X  s 


gkdp 


$  Kiipli, 


(3.1) 


(3.2) 


where  K  >  0  is  a  universal  constant. 


Proof  3.2.  Fix  an  arbitrary  £  >  0.  For  each  L  t  01,  choose  simple  functions 
(Pk  on  X  and  i)>i;  on  y  so  that 


fk  gkdia  - 

(Pk  (Fkdp  <  c/2'' 

(3.3 

J 

X 

Y  i<Pki^  <  1. 

^IvFkl^  <1. 

(3.4 

keo! 

kC?! 

To  establish  the  theorem,  it  suffices  to  verify 


L 

kC'TI 


<Pk 


\|)kdp 


$  K||p||. 


(3.5) 


But  the  implication  (3.4)  =>  (3.5),  in  viewof  the  definition  of  |lp|i,  is  equivalent 
to  the  usual  formulation  of  the  Grothendieck  inequality  (cf.  [13]),  which  we 
state  and  prove  below  (the  verification  of  the  equivalence  is  a  simple  exercise 
in  transcribing  notation  from  one  framework  to  another).  | 
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Fact  3.3  (Grothendieck's  inequality  (the  usual  formulation)). 

Let  ( a,nn  )m,neo!  bc  an  array  of  complex  scalars  so  that  for  all  sequences 
of  complex  scalars  (smlmei!  and  (tmlmcsi  (Is,,,!  $  l.lt.nl  $  l,m  t  71), 

N 

y  Q.nnSmtn  for  all  N  >  0.  (3.6) 

m  ,n  - 1  i 

Then,  for  some  universal  constant  K  >  0  and  for  all  (x,„  and  (y„  Ine'i?. 
sequences  of  vectors  in  the  unit  sphere  of 

N  I 

Y  Qm,n(x,„,yn)  ^  K  for  all  N  >  0  (3.7) 

m,n  1  I 

({  )  denotes  the  usual  inner  product  in  l-^). 

Proof  3.4  (of  the  usual  formulation  of  Grothendieck's  inequality).  We 

can  assume  that  x,n  and  y,„  are  vectors  with  real-valued  coordinates. 

N 

Stepl:  a.nnc'’*"-'’'"-  sic. 

m,T\  1  I 

Verification  Let  be  a  sequence  of  independent  s.andard  nor¬ 

mal  variables.  We  shall  make  use  of  the  following  identity  (an  exer¬ 
cise):  let  X  and  y  be  real  vectors  on  the  unit  sphere  of  l’,  then 

iiyikiZ.  (3  5?) 

We  estimate 

I  N  I 

1  ^  (3.9) 

1  m  ,  T\  1  1 

I  IN 

-c|  y  ^  by(3.8l 

im,n  1 
N 

j;  cE  y  by  (3.6). 

tn .  n  1 

Step  2:  There  is  a  mapping  4^  from  into  so  that 

{4'(x).(y))= 


P(x)||^  =  -  ||x||^  -  1,  X  e 


{  Blei 
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Verification  Merely  observe  that 

((x>y))' =  Y.  ••x(ki)y(ki)-'yicj) 

ki  e’n . kie’Jt 

and  that 

Y  |x(tc,)...x(kj)|^  =  (||x||i)>. 
ki  e’n . k|g<n 

Step  3:  Let  b  =  (e  —  2)r,  and  define  M'b(x)  =  'L(x)/b.  Then,  for  all  x 
and  y  in  the  unit  sphere  of 

cx; 

(x,y)  =  -  1)  (3.12) 

i-c 

denotes  the  j-th  iterate  of  Ob) 

Verification  Observe  that  Ob  maps  tne  unit  sphere  into  itself.  The 
iteration  of  the  identity 

(x,y)  =  -  1  -  b^{Ob(x),Ob(y)) 

implies  (3.12). 

Combining  Step  1  and  Step  3,  we  deduce  (3.7)  as  follows: 

N 

Utn  ft  (X  in  I  y  n ) 

m  .n  1 

=  Y  Q,nn^(-l)'b^>(c<“V"'’‘’''-‘'’'''''>'"'’  -  1) 

tn .  n  1  j  0 

il  Qmn  (e<''’-"  l)j 

j  0  m.n  1  ' 

^  (e  +  1)/(3  -  e). 

The  proof  of  Theorem  3.1  is  complete.  I 

Theorem  3.5  (Grothendieck's  Factorization  Theorem,  e.g.,  [18]).  Let  X 

be  a  locally  compact  Hausdorff  space,  and  let  y  be  a  measurable  space.  For 
every  bimeasure  p  on  X  v  y  there  is  a  probability  measure  v  on  the  Borel 
a-field  of  X  so  that  for  ail  f  €  Co(X)  and  g  t  L°°(y)  (=bounded  measurable 
functions  on  y), 

f-gdu  $  K||f||L.(:r,v|l|glUllFll 


(3.131 
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for  some  universal  constant  K  >  D. 


Proof  3.6.  Consider  the  Banach  space  Co(X)  L^(y]  normed  by 
!i(t,g)||  =  max'||f(i^,  ),f  t  Co(X),g  6  L^(H). 

Without  loss  of  generality,  assume  that  |||.i||  =  t  and  let  K  be  the  constant 
in  Theorem  3.1  above.  Let  W  be  the  set  consisting  of  all  elements  (f,g) 
Co(X)  >  L-'^(y)  which  can  be  written  as 


(f.g)  =  I  X.  X.  )  ’ 


kCtlf 


SO  that 


tX  gkdgl  $  K. 

Theorem  3.1  implies  the  convex  set  W  is  disjoint  from  the  open  convex  set 
O  =  { (f,  g  I  Co(.X)  ^  L'^‘  (9 1 :  max'sup  f(x),sup  gfy );  <  I  j . 

xs.r  >i€!i 

Therefore,  by  the  Hahn-Banach  theorem,  there  is  a  bounded  linear  functional 
on  Co(.X)  ■  1  ^  ly )  which  "separates"  0  from  W.  In  particular  (by  the  Riesz 
representation  theorem)  we  obtain  a  regular  Borel  measure  v  on  X  and 
a  bounded  linear  functional  y  on  L'^iyi  (a  regular  Borel  measure  on  the 
maximal  ideal  space  of  L'^  (y ))  so  that 


tdv  *  Y(g)  1  for  all  lf,g)  t), 

.  .V  ' 


fdv  i  y(g)  >  I  for  all  |f,g)  L  3V. 

.  .V 


Since  0  contains  the  cone  for  all  "negative"  elements  as  well  as  the  open 
unit  ball  in  Cc(X)  ■  ())  I,  we  deduce  from  (3.14)  that  v  and  y  are  positive 

measures  and  that  (|vi|  +  |(yf|  y  I.  Let  f  Co(X)  and  g  e  1  (If  I  be  arbitrary, 
and  assume  that  u( f ,  g )  J  j , ,,  f  gdu  is  nonzero.  We  have 

|K/lu(f,g)ll  (ifi^lgl^l  *'  W. 

and  therefore,  by  (3.15), 

lu(f,g)lyKf  Ifl'^dv  f  ydgl'^).  (3.16) 
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Now  define 

F=  f(Y(lgl^))V 


Ifl^dv 


lf|-^dv)  /(ydgl^DM  g- 


G  = 


In  (3.16),  replace  f  by  F  and  g  by  G,  and  thus  obtain 


f  :•  gdn 


^  •^i'Fi!L’ix.v)(Y(!gl"’)-  ■ 


Replace  -v  by  x/jjv{)  (clearly,  x  0),  and  deduce  (3.13).  | 


(3.i; 


Remark  3.7.  Grothendieck's  fundamental  inequality  is  a  generalization  of 
Littlewood's  classical  "mixed  norm"  inequality  (14j,  and  has  enjoyed  several 
proofs  (e.g.,  [18]).  The  self-contained  proof  given  here  is  a  version  of  the  proof 
in  [21;  the  novelty  in  the  argument  given  above  is  Step  Step  1 . 

The  proof  of  Theorem  3.5  is  a  transcription  of  the  proof  of  a  more 
general  statement  [31,  Theorem  3.2.  It  is  an  adaptation  of  the  argument  used 
to  prove  the  Pietsch  factorization  theorem  (cf.  [13],  Prop.  3.1). 


4.  Integrators 


Let  X  =  (XI  t)  :  0  $  t  i;  1 1  denote  a  fH-valued  stochastic  process  defined  on  a 
probability  space  (O,  !L  P),  and  assume  that  E|X(0)l  <  oc-.  We  norm  X 


X  sup 


e:^  c.AXLtj 
)  I 


itjili  :  N  >  0. 


t:l,C 


tj-i  <  tj  l.i  = 


(4.1  I 


(AXUI  X(h)  -  X( a  1  where  I  is  an  interval  with  end  points  a  ^  b).  Observe 
that  for  each  Y  e  L '  (Q.P),  the  total  variation  of  the  function  EYX(t)(t  t 
|0,  II)  is  bounded  by  |iY|i^||Xl|.  We  shall  call  X  an  integnitor  if  ||X|)  -c  oo. 
Indeed,  define  Fx(A,  t)  EIaXH)  and 

Mx(A  ■  (s.ti)  -  FxlA.t' )  Fx(A,s'),  A  €  U,  0  S;  s  ..  t  ^  I.  (4.2) 

Observe  that  |jX|i  <  oo  if  and  only  if  ux  is  a  bimeasure  on  £}  •  (0, 1],  i.e.,  gx 
is  a  signed  measure  in  the  first  coordinate  and  determines  a  Borel  measure 
in  the  second.  A  statement  equivalent  to  ||X||  <  oo  is  that  X(  1 ),  the  outcome 
at  t  ^  1,  does  not  depend  on  the  way  the  "past"  is  arranged.  To  be  precise. 
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let  lIXil  <  oo ,  and  assume  without  any  loss  of  generality  tha  t  lim  t  i  X  ( t )  = 
X(1 ),  (weak  limit  in  L’  (O.P),  i.e.,  limt_,i  El^Xlt)  =  E1aX(1  )  for  all  A  e  U). 
Let  to  =  0,ti  T  Eand  I,  =  Then  for  all  permutations  rof  (H, 


X(1)-X(0)  =^AX(U|j,).  (4-3) 

j  1 


(weak  convergence  in  L '  ( El ,  P )). 


Theorem  4.1  (Interchangeability  of  stochastic  increments).  Suppose  that 
E|X(t)|  <  oo  for  every  t  e  [0,  U.  Then,  X  is  an  integrator  if  and  only  if  (4.3) 
holds  for  all  tj  T  1  and  all  permutations  Tof  ‘Jl. 

If  X  is  an  integrator,  then  for  every  bounded  measurable  function  t 
on  fO,  1],  f(t)|.ix(-  ^  dt)  is  a  signed  measure  on  (O.U)  which  is  absolutely 

continuous  with  respect  to  P.  We  define 


-I 

0 


f(t)dX(t! 


dP 


f(t)nx(-  •  dt) 


(4.4) 


(Radon-Nikodym  derivative)  and  obtain 


f(t)dX(t);  ■t;i,^,;xii 


1 4. 5) 


The  Grothendieck  factorization  theorem  (Theorem  2  in  Section  3)  implies 

Corollary  to  4.1.  Let  X  be  a  process  with  ((Xil  ^  'x.-.  Then,  there  exists  a 
probability  measure  v  on  [0, 1]  so  that  for  all  f  l-'(!0, 1),  v),  the  stochastic 
integral  f(t)dX(t|,  extending  the  definition  in  (4  4),  is  a  well  defined 
random  variable  in  L '  (O,  P),  and 


t(t)dX(t) 


K[|fiii,’,.,||Xi|. 


(4.6) 


(A  probability  measure  v  for  which  (4.6)  holds  will  be  called  a  factorizing 
measure  for  X.) 

In  the  case  that  nx  is  extendable  to  a  signed  measure  on  D  ■  10, 11, 
stochastic  integration  of  random  integrands  with  respect  to  X  proceeds  in 
the  usual  framework  of  measure  theory.  In  this  case,  v  of  Corollary  to  4.1  is 
of  course  the  total  variation  measure  luxl-  Stochastic  integration  has  a  life  of 


its  own  precisely  because  in  most  cases  nx  is  not  extendable  to  a  bona  fide 
measure  on  O  x  [0, 1],  i.e.. 


||X||,T)  =:sup 


^lux(Ai  X  Ij)|;  11a,  ^ 


^  1 


("total  variation"  of  ux) 


=  sup 


=  oo. 


(4.7) 


A  process  X  for  which  ||X||  <  oo  and  ||Xil(Ti  =  oo  will  be  called  a  stochastic 
integrator. 

The  Corollary  to  4.1  implies  a  series  representation  of  an  integrator 
X.  Let  {(Pk)ke<Jt  be  an  orthonormal  basis  of  L^([0,ll.v),  define  X(k.)  = 
Jlo  ,|  (PkdX,  and  write 


x^  y  x(k](Pk. 

kcO! 


(4.8) 


Corollary  to  4.1.  For  all  f  =  If(lc)(pk  €  L^(f0,  Il.v) 


fdX  =  Urn  y  f(k)X(k) 

tio.n 

(weak  convergence  in  L'  (n,  P)). 


(4.9) 


5.  Some  examples 

Example  5.1.  Let  X  be  an  l/^-bounded  martingale  process  defined  on  [0, 1) 
and  let  vlt)  =  E|X(t)l'^,  (t  ^  iO,  11). Then,  X  is  an  integrator  with  a  factorizing 
measure  dv  (an  application  of  Crothendieck's  factorization  theorem  is  not 
needed  here).  If  X  is  a  non-constant  continuous  L^-bounded  martingale,  then 
the  sample  paths  of  X  are  almost  surely  of  unbounded  variation  (cf.  [19], 
Corollary  1,  p.  64),  implying  in  particular  that  X  is  a  stochastic  integrator. 

Example  5.2.  More  generally,  any  L'^-bounded  process  with  orthogonal 
increments  is  an  integrator.  In  this  general  case,  in  contrast  with  the  specific 
case  of  L^ -bounded  martingales,  a  factorizing  measure  is  produced  by  an 
application  of  the  Grothendieck  theorem.  An  L^-bounded  process  X  with 
orthogonal  increments  is  a  stochastic  integrator  if  X  has  factorizing  measure 
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■w  which  is  absolutely  continuous  with  respect  to  Lebesgue  measure,  and 
which  satisfies 

e[  fdX  ^  Kl|X||||f||L.,,)  (5.1) 

for  all  f  e  L^([0,  l],-v)  and  some  fixed  constant  k  >  0.  A  class  of  such 
processes  is  canonically  produced  as  follows:  Let  (0,P)  =  ([0,ll),dx) 
(dx  =  Lebesgue  measure),  and  fix  a  set  of  "random  variables"  {cos  27mt}„es 
with  the  property  that  the  L' -closure  and  the  L^ -closure  of  the  span  of 
{cos27mt}n€s  are  equal  (sets  S  with  this  property  are  called  A(2)-sets,  e.g., 
[20]).  Denote  this  closure  by  Lj.  Fix  any  unitary  equivalence  U  between 
L^(f0,  l],dx)  and  L|,  and  define  X(t)  =  Ulio.ti-  verifies  that  X  is 

an  integrator  (J|p  fdX  =  Uf,f  €  L^([0, 1],dx)),  that  Lebesgue  measure  is  a 
factorizing  measure  for  X,  and  that  (5.1 )  is  satisfied. 

Example  5.3.  Any  L’ -bounded  process  with  independent  increments  is  an 
integrator.  In  this  case,  in  the  absence  of  any  other  data  about  the  process,  we 
require  an  application  of  Grothendieck's  theorem  to  produce  a  factorizing 
measure. 

In  a  specific  case,  when  p  >  1,  p-stable  motion  is  a  stochastic  inte¬ 
grator  with  a  factorizing  Lebesgue  measure  (p-stable  motion  means  that 
increments  are  independent  symmetric  p-stable  variables).  This  is  obtained 
directly  by  use  of  properties  of  p-stable  distributions,  without  the  use  of  the 
Grothendieck  theorem. 

Example  5.4.  Finally,  we  indicate  how  to  construct  stochastic  integrators  by 
methods  which  are  amenable  to  computer  simulations.  The  strategy  is 

1)  to  produce  processes  X  over  finite  probability  spaces  so  that  11X||  is 
"small"  and  ||X||(t)  is  "large", and  then 

2)  "paste"  together  these  processes.  We  briefly  describe  the  first  step. 


Theorem5.5  ([11],  Chapter  6).  Let  A  i,...,Ak  be  finite  subsets  of  (H.  There 
exists  (with  "high"  probability)  a  choice  of  signs  Ci,  =  ±  1,  (ii , . . . ,  iid  e 
Ai  X  •  ■  X  Ak, so  that 


.  .ik  )€  A  j  X  x 


$  DIA,  X  X  Ak|i(|A,|+  -  +iAkl)i. 


(5.2) 


for  some  numerical  constant  D  >  0  which  depends  only  on  k. 


Corollary  to  5.5.  Let  m  be  an  arbitrary  positive  integer  and  let  dm  denote 
the  uniform  probability  space  [1 , . . .  ,2"*].  There  are  processes  X  on  dm  so 
that  ||X||  ^  l.and  ||X|||t)  ^  Km,  where lim,n_,oo  =  oo- 


Proof  5.6.  We  think  of  dm  as  the  compact  abelian  group  {-1 ,1)’"  with  its 
usual  normalized  Haar  measure.  Let  Fm  C  dm  be  arbitrary  and  let  a  be  an 
injection  from  Fm  onto  {1 , . . . ,  [Fmll-  Define  a  process  X  on  d  m  by 

Xa,(t)=— ^  y  l|0.i|(cr(x)/|f„.|)x(‘i^).  t  e  [0,  Ti,  CO  e  dm-  (5.3) 
'  ‘  xef,n 

The  definition  of  X  and  harmonic  analysis  on  dm  imply 

IIXIIki  ^|Fml''^  and  |(X||  ^  1. 


Proof  5.7.  Let  Fm  C  { 1,...,  2'"  J  be  arbitrary.  By  Theorem  5.5,  we  find  (with 
"high"  probability)  Cio,  =  ±l.i  6  F,a'  €  dm, so  that 


$  C|Fml''-^2"',  (5.4) 

where  C  is  a  positive  constant  that  is  independent  of  m.  Define  a  process  X 
on  dm  by 


II  ml 


Xu>(t)  =  —  -  f  y  llo.llli/llmli^^i' 

4  L }  r  rn  I  ^  (  y 

The  definition  of  X  and  (5.4)  imply 

IIXIIui  ^4C|Fml'  '^  and  ||X||  $  I. 


t  e  fO,  I),  CO  6  dm- 


(5.5) 


Remark  In  the  case  of  explicit  constructions  of  X  (Proof  5.6),  the  uniform 
probability  measure  on  Fm  is  a  factorizing  measure.  In  the  case  of  ran¬ 
dom  constructions  of  X  (Proof  5.7),  a  factorizing  measure  is  obtained  by  an 
application  of  Corollary  to  4.1. 
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I  We  give  a  formula  for  the  Hausttorff  dimension  of  certain  fractals  obtained 
by  a  weakly  ergodic  Markovian  prrKess.  This  result  is  closely  related  with 
the  entropy  of  certain  ergodic  measures. 


1.  Introduction 


To  understand  our  problem  we  recall  some  elementary  results.  We  consider 
a  Bernoulli  trial  on  the  two-state  space  described  by  the  following  sequence 
of  transition  matrices; 


where  0$  a,,  ^  l,n  =  1,2,...  The  measure  non  (0,1)  corresponding  to  this 
process  can  be  considered  as  an  infinite  product  called  the  Radenwcher-Riesz 
product  (R.R  product) 


N 

dn  =  lim  FT  ( 1  +  a„r„(x))  dx, 

N  -♦  X.  * 
n  1 


where  r„  (x)  is  the  Rademacher  function  associated  with  the  positive  integer 
n  and  the  limit  is  in  the  weak*  sense. 

In  the  case  where  On  =  a,  for  any  n,  it  is  easy  to  estimate  the  Hausdorff 
dimension,  dim  M  (cf.  [ll),of  a  Borel  set  M  of  positive  measure.  In  fact  by  a 
well-known  formula  it  suffices  to  estimate 


lim  inf 


logplbnlx)) 

-nlog2 
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xa.e.  with  respect  to  n,  where  En(x)  is  the  segment  [^,  containining  x 
for  some  k  =  0, 1 , . . . ,  2"  —  1 . 


We  sketch  a  proof  based  on  the  Birkhoff  Ergodic  Theorem.  By  elemen¬ 
tary  calculations  this  is  equal  to 

l^n^l+ar,|>))2-"  I  1 

— n  log  2  log  2  n 

For  any  x  €  [0, 11  let  the  decimal  expansion  of  x  =  xiX2  ■  Xn  ■  and 


} 


Tx  =  T  (XiXi  •  •  •  Xn  ■  ■  •)  =  X2X.i  •  •  •  X„  •  ■  ■ 

be  the  shift  operator.  Since  r„{x)  =  1  -  2x„,  ri  =  1.2,..  .,the  last  formula  can 
be  written: 


one  can  easily  show  that  T  is  a  measure-preserving  transformation,  and  by 
Birkhoff's  Theorem  (cf.  [8])  the  limit  of  the  last  expresion  is  equal  to 


log2  J 


■'  1 
log(l  -harilxlldnlx)  =  I  - 


logfl  +  Q)'  '  “(1  -  Q 


1 1  -  <i 


The  right  hand  side  of  this  is  the  entropy  of  this  process  and  so,  in  this 
case  the  Hausdorff  dimension  is  equal  to  the  degree  of  the  entropy.  This  is  a 
special  case  of  the  Shannon-McMilan  Theorem,  which  states  that  for  ergodic 
measures  the  Hausdorff  dimension  is  equal  to  the  degree  of  the  entropy  [1  J. 

In  this  work  we  shall  show  that  for  a  wide  class  of  non-ergodic  non- 
homogeneous  Markovian  measures  this  theorem  applies.  More  precisely, 
we  state; 


2.  The  main  result 

Let 


P'^”  =  (ri,P2 . Pr),  p"”  --(Pu).  1,2 . I  ;;  i,j  ;;  r 

be  a  non-homogeneous  Markovian  stochastic  process.  Let  for  any  n  the 
ergodic  index  (see  [5]): 


^  i.k 
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We  suppose  that, 

1 

^  ^  ^0,  (2.1) 
n ,  1 

(N  — >  oo),  where 

p(m+l.n— 1)  _  p(m4  l)p(mf2)  p(n— 1) 

We  denote  by  ^  the  measure  associated  with  this  process;  then  the  Hausdorff 
dimension  of  n  is: 

dim(supp  4)  $  -limjnf  ^  L  P  '“gT^U  (2-2) 

n  0  ^  i.j 

where  'Li  and  Li  =  (0, . . . ,  I, . . .  ,0)^  (1  is  in  position  i). 

A  sufficient  condition  for  the  equality  in  (2.2)  can  be  given  by  a  weak 
convergence  of  the  left  hand  side  of  (2.1 )  as  is  0(  ( logN )  “^  1. 

In  a  sense  this  is  the  entropy  of  this  process.  This  result  is  more  general 
than  the  previous  works  in  [2]  and  [6]  as  well  as  the  Shannon-McMilan- 
Breiman  Theorem  on  the  entropy  of  Markovian  processes  [1  ]. 

In  the  first  work  we  dealt  with  the  Bernoulli  schema  described  bv  the 
sequence  of  matrices  P'”' as  in  (I.l).  As  before  we  write: 


log  ulLn(x)) 
-nlog2 


1  1 
log  2  n 


L 


logd  +  Q^rv,(x)) 


l‘«gl’-°c)+^c(x)log 


Then  because  the  Rademacher  functions  are  independent  random  vari¬ 
ables  du,  we  use  Kolmogorov's  law  of  large  numbers  to  obtain  formula  (2.21 
for  the  special  cases  where  r  =  2,  p)',  =  ( 1  f  q„  )/2,  p!\  =  ( 1  -  a„  )/2. 

In  the  second  work  we  considered  a  Bernoulli  schema  on  the  r-state 
space  (r  ^  2),  We  described  the  measure  p  as  a  generalized  R.R.  product 
and  derived  (2.2). 

Here  we  have  to  note  the  works  of  j.  Peyriere  [71  and  C.  Brown  W.  Moran 
and  C.E.  Pearce  (.'!]  on  the  Hausdorff  dimension  of  trigonometric  Riesz  prod¬ 
ucts  fin  il'  +  QnCosAnxldx,  where  A„  is  a  sequence  of  integers  satisfying 
a  condition  such  as  for  any  uA„  divides  An  4 1  or  A,, /(A,,  4 1 1  ->  0,  n  -4  ix). 

Next  we  describe  the  proof  of  the  main  result  and  in  the  last  section 
we  describe  our  representative  example.  It  seems  to  us  that  a  probabilistic 
approach  to  the  Hausdorff  dimension  of  trigonometric  Riesz  products  is 
possible;  we  hope  to  deal  with  this  in  another  work. 
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2.1.  The  Hausdorff  dimension  of  a  Markovian  process 

We  describe  the  measure  p.  associated  with  the  non-homogeneous  Markovian 
stochastic  process  determined  by  a  sequence  of  matrices: 

p(0)  p(  1  1  pin) 

We  assume  that  all  matrices  have  dimension  r  x  r  except  which  has 
dimension  1  x  r.  Notice  that  our  result  is  still  valid  under  the  more  general 
assumption  that  all  matrix  multiplication  involved  are  performable. 

As  in  [6]  we  use  generalized  Rademacher  functions.  For  each  i  = 

1 , 2, . . . ,  r  we  define  the  sequence  of  Rademacher  functions  associated  with 
i:  Ri,(x)  =  1  -  rixn.,/  where  6  ,,  is  the  Kronecker  delta  and  xixi  ■  x„  ■  ■  is 
the  r-adic  expansion  of  x  €  fO,  11. 

Observation  2.1.  The  measure  p  associated  with  the  stochastic  Markovian 
process  matrices  P'*-'',  P P* is  a  product, 


rc 

dp  =  Gn(x)dx, 

n  0 

where 

Ga(x)  =  I  +  (pf  -  p'l'lRj  (x)  +  •  ■  ■  +  IpI'  -  IRlj^'  (x) 
and  for  any  n  j  1: 


Proof.  It  is  easy  to  check.  | 
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Observation  2.3. 

1)  First, 

r' 

Rj,(x)dn(x)  = 

Jc 

2)  Second, 

pi 

R;„(x)Rj,dn(x)  =  pio.'’'-’>(iP""''-"ljE 

.  0 

where  '  =  P'^P'  ■  P’’^’,  E  =  ( I ,  I , 1  and  U  is  equal  with 

the  identity  matrix  (r  x  r)  in  all  elements  except  in  the  position  (i,il 
where  1  is  replaced  by  r  -  1 . 

Proof.  Easy.  I 

Proof  (of  the  main  result).  We  estimate  the  lim  inf  of 

logp(EN(x))  ^  I  _  llogGnlx)  ^  ^  Ln  1  ii 

N  log  r  N  log  r  H  ^ 

For  each  pair  of  (l,i)  we  shall  apply  the  law  of  large  numbers  for 
nEm  I  gTilx)  where  g;i(xl  =  !f|'.(x)logrp;',) 'r-. 

It  suffices  to  show  that  for  N  — '  oo  the  variance; 


Then  by  Chebyshev's  inequality  w'e  have  |2.2).  Thus  for  (2.3)  we  show 

lq(gl’>)u(g''i)  -  q(grig;"ll  <  Con.st6(P' (2.4). 
In  fact  from  Observation  2.3  the  left  hand  side  of  (2.4)  is 
|(P’^  logrp-yilp^  logrp'l)P^"'"'-"Bij 

lE.P''''’’  -’'BpE  -  p"'"  b„e|| 

const  'B„  |ec  -  (c  ^  i  (p! e]  | 

t;  constd  ('pi"’  "'j  , 

where  the  matrix  Bp  is  equal  with  the  zero  matrix  in  all  elements  except  in 
the  position  (i,j)  instead  of  0  it  has  1,  and  where  c  is  a  constant  depending  on 
i,  j,  m  and  n. 

For  the  equality  of  (2.2)  one  can  use: 

Theorem  2.4  (Davenport,  Erdos  and  Leveque).  If  x„  >  0  and  the  series; 
EiT  I  Xn/n.  converges.then  there  exists  an  increasing  subsequence  T^^,  such 
that  Uk ,  i/rik  -)  1  and  the  series  ,  x,,^  converges. 

■ 
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3.  Application:  Measures  on  the  Poisson  boundary 
of  the  free  group  of  two  generators 


This  was  our  initial  exanaple  for  this  work.  We  describe  the  non- 
homogeneous  Markovian  process  on  the  tree  that  rcpresends  the  reduced 
words  on  the  free  group  of  two  generators  {a.b].  Let  {1, 2, 3, 4}  be  the 
state  space.  We  assume  that  on  each  state  there  correspond  the  generators 
[q,  ,  b,  b“’ ;  respectively.  The  transition  matrix  P*"',  n  =  1,2,...  is 

the  following: 


p  ( « 1  _ 


/  '  * 

/  s  4  u  n 

0 

1  t  O  ,v 

f  Cl 

\  ^ 


1  -  Cl, 


Ki , 
—  Ci  I 


3— Cl, 


] 

1-On 

i  an 

1 

\ 

♦  Cl,, 

I-Un 

3 

—  Cl  n 

.4-0,, 

\_ 

f  Cl,, 

0 

f  On 

0 

l-On 

<  1 

.  It  is  not 

and 

by 

an  easy 

(kl  1 

.  Thus  c 

where  0  $  On  and  sup  a 
index  6(P*"''  <  const  < 

webaveilP'"'''")  $  rik 

the  Hausdorff  dimension  of  the  boundary  of  this  tree  (cf.  [4])  is  given  by  the 
equality  of  (2.2). 
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I 

a  An  iterated  tnnction  system  (IFS)  on  a  compact  metric  spaee  is  a  linite 
collection  ('t  mappings  of  the  space  inti>  itself.  Lnder  so  liable  conditions,  .111 
attractor  for  the  IFS  can  be  defim*d  in  a  natural  wav.  If  the  mappings  depend 
on  a  parameter,  the  subset  of  the  parameter  space  for  wich  the  cor  responding 
attractor  is  connected  iscalk>d  the  Mandelbrot  set  ot  ihefamils  L  nder  some 
suitable  symmetry  conditions,  it  can  be  shown  that,  lor  parameter  \ allies 
not  in  the  M.indelbrot  set,  the  attractor  is  homeomorphic  to  the  Cantor  set. 
We  determine  the  Mandelbrot  set  for  a  tamily  of  nonlinear  IF's's  on  the  unit 
triangle  oblaiiu'd  trom  the  problem  of  tillering  the  noise  from  a  Markov 
chain  signal  nuKlel. 


1.  Introduction 

I  et  Us  consider  a  finite  set  ol  transiormations  T  I , ,  i  I . ‘n  acmg 

on  a  complete  metric  space  K.  For  anv  input  sequence  i,,  taking  values 

m  l,„  I . the  sequence  is  generated  ib.rough  the  recursion 

s,,  ,  I  I,  from  anv  chosen  initial  point  sy  ■  K,  V\e  refer  to  the 

collection  of  all  these  sequences  as  the  iterated  function  svsiem  generated 
hy:i|2,  I). 

If  these  transformations  are  contractive,  i  e..  l.ipschit/  with  a  I  ipschit/ 
constant  strictly  smaller  than  one,  it  is  easv  to  show  that  the  IFS  has  an 
attractor  S,  which  is  a  subset  of  K  approached  by  s„  as  n  incriMses,  inde¬ 
pendently  of  the  initial  point  and  the  input  sequence  (2.  7.  S|  For  the  precisi' 
definition  see  the  next  section  Such  an  attractor  S  is  v h.iracteri/ed  as  the 
unique  compact  subset  of  K  which,  is  T-seli-similar,  th.it  is  S  ,  I  iS  •. 

J  '\rnes  Vi  al  (vds  i  Frohu/uli^rit  unJ  Siin  fiJjHi  tn  Ant/Ivus.  mffi 

1  ‘W2  Kluy>vt  iJiJvniit  f'nhh'.hvrs  Piinivtl  in  ihv  \vthvrliinjs 
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Whenever  each  point  s  e  S  can  be  uniquely  decoded,  that  is  there  exists 
only  one  input  sequence  {ini  such  that  s  is  the  limit  of  Fi,  (Fi,  (. . .  (Fi„  (x|))) 
(which  in  any  case  will  not  depend  on  x  6  K),  then  the  attractor  is  to¬ 
tally  disconnected.  Under  some  symmetry  condition  on  J  the  converse  is 
also  ‘rue.  I-lence  a  remarkable  connection  is  established  between  geometry 
and  dynamics. 

if  the  input  at  time  n  is  chosen  at  random  from  a  probability  distribution 
Ps„  depending  on  the  current  value  of  Sn,  but  not  on  the  previous  ones,  the 
sequence  [snl  is  turned  into  a  Markov  process  [Snl-  The  study  of  such  a 
class  of  processes  started  with  [121,  [41  and  later  their  asymptotic  behaviour 
has  been  investigated  in  [10],  [11],  with  motivations  arising  from  the  theory 
of  learning.  In  particular  the  romanian  school  [9]  contributed  to  this  field 
[9]  giving  them  the  name  of  random  systems  with  complete  connections  (to 
which  we  will  prefer  the  more  recent  one  of  iterated  function  systems  with 
place-dependent  probabilities  [3]).  The  main  result  of  this  theory  is  that  if  the 
convergence  to  S  is  exponential  (as  in  the  case  of  contractive  transformations) 
and  a  uniform  positivity  condition  on  the  family  of  conditional  probabilities 
Ps  holds,  then  has  a  unique  stationary  probability  law  which  is  easily 
seen  to  be  supported  by  the  attractor  S.  Thus  the  relevant  dynamics  takes 
place  on  S  and,  if  S  is  totally  disconnected,  it  is  possible  to  extract  from  S„ 
the  whole  sequence  of  inputs  applied  up  to  time  n. 

In  this  paper  we  are  interested  in  an  important  class  of  IFS  on  a  compact 
set  K  with  place-dependent  probabilities  which  are  obtained  from  some 
problems  of  recursive  estimation  of  Markov  chains,  already  introduced  in 
[5].  Let  [Xn  I  be  an  irreducible  and  aperiodic  Markov  signal  on  the  state  space 
Id  and  {Yn!  be  an  observation  process  coming  from  a  noisy  memorylcss 
channel  with  values  in 

Let  P  =  [pi, !  be  the  transition  matrix  of  iX„l  and  >  0  be  the  prob¬ 
ability  of  an  observation  i  t  given  that  the  signal  at  the  .same  time  is 
i  F  la. 

The  predictive  sequence: 


Sn(i)  =  P(X„  iiY,,i  .1 . n  )),i  =  l . d. 


(1.1) 


is  then  an  IFS  with  values  in  the  unit  d-dimensional  simplex  L.c  since  it  is 
possible  to  write  as  S„  ,  t  ^  Fy  JS„I  where; 


F.(s)lh) 


Ed 

I  I  P )  h  C  i  1  •'' ) 

A  ) 

I  I 


and 


a 

P(Y„  ^  ilS„,S„  ! . 1  -71, (S„)  Y  f,iS,. 

I  1 


(1.2) 


I  l..tl 
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It  is  immediately  checked  that  in  general  not  all  of  the  F/s  will  be 
contractive  transformations  on  Id-  Nonetheless  the  existence  of  the  attractor 
S  can  be  shown  only  by  using  the  weakercondition; 

diam(Fi|  o  . . .  o  Fi,J(Id)  — >  0  (1.4) 

winch  holds  uniformly  in  the  input  sequence,  as  n  — >  cxa.  Moreover  such 
a  convergence  is  exponentially  fast,  which  implies  the  uniqueness  of  the 
stationary  law  for  [Snl  and  the  fact  that  it  is  supported  by  S.  These  results 
can  be  found  in  [5]. 

The  previous  analysis  allow  one  to  study  the  extent  of  the  memory  of 
the  predictor  through  the  geometry  of  the  attractor.  Under  suitable  sym¬ 
metry  conditions  such  an  attractor  is  totally  disconnected  if  and  only  if  the 
predictor  has  infinite  memory,  i.e.,  Yn-i.  Y,|_2, . , can  be  extracted  from 
Sn  (and  such  a  relation  is  continuous).  Our  interest  is  to  identify  the  val¬ 
ues  of  the  parameters  for  which  this  happens,  at  least  for  some  family  of 
prediction  problems. 

Such  a  program  has  been  completely  pursued  when  both  ;X„;  and 
'Y„!  are  binary,  in  which  case  the  attractor  is  either  perfect  and  totally  dis¬ 
connected,  or  it  is  an  interval.  For  the  ternary  case  we  extend  the  results 
contained  in  [5]  which  is  the  original  contributiem  of  the  paper. 

The  organization  of  the  paper  is  as  follows.  In  Section  2  the  basic 
theory  of  attractors  for  IFS  on  a  compact  set  is  reviewed  starting  from  the 
assumption  (1.4).  In  section  3  we  completely  classify  the  connectedness  of 
the  attractor  in  the  e mary  completely  symmetric  case,  that  is 

P.i  (I  --Pliii  r  -’(I  -  6ijl. 

;  11.3) 

t:u  (I  -  f  ^|l  6,1 1,  i.)  1,2,3, 

with  the  onlv  assumption  thet  0  •-  c  <■,  2/3. 

2.  IFS  on  a  compact  set  and  their  attractors 

Let  K  be  a  compact  metric  space  and  T  -=  (Fj.i  e  I,„J  a  family  of  continuous 
transformations  of  K  into  itself.  Let  us  denote  by  F;, ,  ,,  the  composition 

F,,  o  ...  o  Fi„  and  let 

. Ljci;;:.  12.1) 

,\ow  suppo.se  that 

lim,,  sup  diatnFIKj  0.  (2.2) 

I .  t  •’ 


{  Piccioni,  Regoli  67:  } 

Such  a  condition  is  obviously  satisfied  whenever 

dlfiU'i.’Fily))  $  rjdtx.y)  (2.3) 

for  some  r;  <  1 ,  i  t  I.n,  but  on  a  compact  set  is  obviously  weaker.  The  goal 
of  this  section  is  to  review  the  by  now  classical  theory  of  attractors  for  IFS  [2, 

7, 8]  starting  from  (2.2)  rather  than  (2.3). 

It  is  an  immediate  consequence  of  (2.2)  that  for  any  sequence  !i„j  t 
the  sequence  of  transformations  (Fi,_  i.  converges  uniformly  to  the 

constant 

d)(;in;)  =  fi  Fi, . ijK). 

!1  1 

The  mapping  (D  ;  ->  K  is  clearly  continuous  so  the  range  of ‘t>  is  a  compact 

set  called  the  attractor  of  the  IFS  T. 

Theorem  2.1.  Let  S  be  the  attractor  of  T  Then 

i:  For  any  nonempty  compact  set  C  in  K 

U  (Fi,,  .iJ(C)  (2.4) 

'  M  .  .1,^  '*r  i 

as  n  -+  00,  in  the  Hausdorff  metric; 
ii:  S  is  the  unique  compact  subset  of  K  such  that 

m 

S  .=  T!S)  =  (JF.ISI:  (2.5) 

I  I 

iii:  FTir  any  ,s  S  the  set  .  F, ,  ,, ,  I .s )  :  I  i  i . i„  i  •  1 1;, ,  n  0,  1 _  is 

dense  in  S; 

iv:  if  ,s'^  ,  is  the  unique  fixed  point  ('f  I  i,  then  S  is  the  closure 

of 'S;,.  :  ifi . >n)  -  lln.n  1,2...... 

Proof,  i)  is  straightforward  since  both  ;r"(K)  and  :f"(1x)  ),  for  any  x  K,  are 
in  a  ball  of  arbitrarily  small  radius  around  S  in  the  Flausdorff  metric  for  n 
sufficiently  large.  f5y  the  definition  of  (p,  since  each  F,  is  continuous,  it  is 
Fi(S|  c  S,  Vi  e  I,„.  On  the  other  hand  if  s  -  Of'inM  then  ,s  e  F;,  (S).  This 
shows  (2.. 3 1 .  Uniqueness  follows  from  (2.41.  iii)  follows  bv  taking  C  s' 
in  (2.41  and  observing  that  (2.5)  implies  S  -  T'MS),  for  anv  integer  n.  To 
establish  iv)  first  note  that  (2.2)  makes  impossible  the  existence  of  more  than 
one  fixed  point  for  each  F,,  , , .  Hence  iv)  esults  from  the  continuity  of  O, 

since  any  .sequence  ,'i„ '  >  I,',  can  be  approximated  by  a  periodic  one  and  the 
O-images  of  periodic  sequences  arc  precisely  all  the  fixed  point.->  of  F, ,  ,  , 

for  any  integer  n  and  Ik  ’  I,„,V  I . n.  | 


{  673  The  geomctn/  of  attractors  for  a  class  of  iterated  function  systems  } 

We  will  be  particularly  interested  in  characterizing  those  situations  in 
which  CD  is  invertible,  that  is  from  each  point  s  in  the  attractor  we  can  recover 
in  a  unique  way  its  code  that  is  the  whole  sequence  of  inputs  |in which  has 
produced  it  through  the  mapping  <D. 

Theorem  2.2.  CD  is  invertible  if  and  only  if  for  each  i  €  I,„,  F;  restricted  to  S 
is  one-to-one  and  for  ii  ^  i^,  F;,  (SI  1 1  F;,  (S)  =  1/1.  In  this  case  S  is  perfect  and 
totally  disconnected. 

Proof.  The  necessity  is  evident.  For  the  sufficiency  observe  that  i„  can  be 
read  observing  to  which  of  the  mutually  disjoint  sets  FdSl.i  6  1,,,,  the  point 
(Fr  '  0 . .  ,cFr '  )(s)  belongs.  Finally,  if  cDis  invertible  it  is  a  homeomorphism 

between  and  S,  hence  the  former  being  perfect  and  totally  disconnected 
S  is  such.  I 

It  should  be  clear  that  the  previous  theorem  cannot  be  used  to  establish 
that  CD  is  invertible  if,  as  usual,  S  is  unknown.  For  this  reason  the  following 
corollary,  whose  proof  is  immediate,  is  more  useful  in  practice. 

Corollary  to  2.2.  Let  C  be  a  compact  subset  of  K  such  that  F,  ( C I  _  C  for  each 
i  l,n.  Then  S  c  C,  hence  the  conditions  of  Theorem  2.2  are  satisfied  if  they 
hold  in  C. 

Unfortunately,  S  can  be  perfect  ana  totally  disconnected  without  cD 
being  invertible  [7j.  However  the  following  result  can  be  used  in  many 
cases  to  rule  out  this  possibility.  To  any  family  (Bi,"’  ,  of  ni  subsets  of  K 
we  associate  an  undirected  graph  F.tiliBil"' , )  on  the  vertex  set  Im,  called 
the  intersection  graph,  having  an  edge  connecting  i  and  k  if  and  only  if 
Bj  P!  Bi.  0.  The  proof  of  the  next  result  is  taken  from  that  of  17]  for  the  case 
of  weak  contractions  but  again  it  relies  only  on  (2.2). 

Theorem  2.3.  The  attractor  S  is  connected  if  and  only  if  the  graph 

in,(:F,(S):;M 


is  connected. 

Proof.  The  proof  consists  of  showing  that  S  is  well-chained,  that  is  for 
any  0  and  any  choice  of  s  and  t  *  S  there  exists  a  chain  Sk,  •  S  , 

with  -se  s  and  si  ^  t,  such  that  dfsi,,.S|,,  i)  ■  for  k  1 . 1.  This 

will  follow  once  we  prove  that  for  any  integer  n  the  intersection  graph 
Tm-  (T‘,, .  ,i,  ;  i,  ,,  I  is  connected.  In  fact  then  a  chain  connecting  s 

and  t  can  be  built  in  such  a  way  that  for  k  1 . 1,  si,  i  and  .S|,  both  belong 

to  some  common  F,,  ,,  iS|  hence  their  distance  is  smaller  then  for  n 

sufficiently  large. 
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Finally  such  an  intersection  graph  can  be  easily  proved  to  be  con¬ 
nected  by  induction,  observing  that  for  any  integer  n  and  any  choice  of 
(ii,-.  •  ,fn)  e  I^n  the  intersection  graph  , )  is  again  irre¬ 
ducible.  Since  the  union  of  the  above  family  is  Fi, . (S)  it  is  easy  to  build 

a  path  connecting  (ii . in.inuland  (ii , . . .  .in.in  1 1 )  -i  I”,*'  from  one 

connecting  (\i , . . .  ,i„l  and  (ii . i„l  e  17„.  I 

The  last  two  theorems  allow  one  to  obtain  the  following  alternative 
whenever  some  symmetry  argument  can  be  used  to  show  that  the  assump¬ 
tion  is  fulfilled. 

Corollary  to  2.3.  Let  us  a  consider  an  IFS  5(a)  =  'Fj(a),i  c  Iml  made 
by  one-to-one  maps  which  satisfies  (2.2)  for  any  value  of  a  and  call  So  its 
attractor.  Suppose  that  whenever  there  exists  i  and  j,  different  from  i,  such 
that  Fi(a)(S„  )  r ,  F|(a)(So )  t  <!>,  then  the  whole  graph  r,„(;Fi(a)(So )'”’  , )  is 
connected.  Then  So  is  either  perfect  and  totally  disconnected  or  connected, 
and  this  happens  when  tPo  is  invertible  or  it  is  not,  respectively. 

With  terminology  taken  from  [1]  we  can  call  the  set  of  all  values  of  a 
for  which  So  is  connected  the  Mandelbrot  set  of  the  family  of  IFS  5(a).  Our 
main  interest  is  to  investigate  such  a  set  for  families  of  IFS  which  will  be 
introduced  in  the  next  section. 

Let  us  also  remark  that,  by  a  general  result  in  [6],  the  invariant  mea¬ 
sure  ti  of  a  Feller-Markov  process  |S„)  on  a  compact  state  space,  can  be 
approximated  by  the  empirical  measure,  that  is  ^  ,  S.s.,  converges  in 

distribution  to  u  with  probability  one,  for  any  possible  starting  point  so  *.  K, 
This  allows  to  get  an  approximate  picture  of  the  measure  u,  hence  of  its 
support,  by  simulating  (S„!,  moreover,  if  ,so  €  S,  then  such  an  empirical 
measure  is  supported  by  points  belonging  to  S  (e.g.,  the  fixed  point  of  some 
Fi).  This  has  been  used  to  get  the  pictures  shown  in  the  paper. 


3.  Ternary  case 

In  this  final  section  we  consider  the  IFS  (1.2)  with  parameter  values  as  in 
(1.5). 

The  mappings  F,  can  be  rewritten  as: 


Fi(s)(i) 

-  iF(Si), 

i  -  1,2,3 

(3.1 

ri(s)(i) 

-  rl'(sv, sj ), 

i-1,2,.Li2l, 

(3.2 
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where: 


tt>(si) 

U)(Sl,S2) 


(1  -p)(l  -  £)si  +  ^(1  -  Si) 

(1  -£)S,  +f(l  -Si)  ’ 

IL_11Es,  +^^S2  +  1^(1-S,  -S2) 


0  $  Si  $  1. 


£)si  + 


■  S|  I 


for  Si ,  S2  ^0  and  si  +  S2  $  )•  The  trivial  case  in  which  p  =  2/3  is  excluded 
in  the  sequel,  hence  we  can  assume  that  the  mappings  Fj  are  one-to-one. 


Remark  3.1.  We  will  have  to  check  that  the  image  under  F(  of  some  polygon 
G,  with  edges  parallel  to  those  of  I?,  is  contained  in  another  polygon  Go 
of  the  same  kind.  Since  Ft  maps  I3  homeomorphically  onto  its  image  it 
will  be  enough  to  check  that  Fi(3G)  =  3Fi(G)  c  Go,  since  in  each  case  it 
can  be  trivially  established  that  G  it  is  mapped  into  the  appropriate  side  of 
the  boundary.  Moreover  since  each  component  of  Fj  is  monotone  on  every 
segments,  it  will  be  enough  to  verify  that  the  vertices  of  G  are  mapped  into 
Go  by  Fj. 

It  is  immediately  checked  that: 

Fj  =7rijoFi07Tji, 

where  7ii,j  exchanges  the  j-th  with  the  i-th  coordinate,  leaving  the  other  fixed. 
This  implies  that  the  attractor  S  is  invariant  under  coordinate  permutations. 
Therefore  Corollary  2.5  holds. 

Now  let  us  suppose  that  the  signal  chain  is  persistent,  that  is  0  < 
P  <  2/3.  This  case  has  been  already  covered  by  [5],  but  for  the  sake  of 
completeness  we  report  the  full  proof  here. 

Each  F;  is  one-to-one  onto  the  triangle  with  vertices  (1  -  p,  p/2,  p/2) 
and  its  permutations  and  its  fixed  point  has  the  form 

S*(i)  =  (I  -  C)6ij  +  ^(l-6ii),0<(:<  y  (3.3) 

In  fact,  being  the  normalized  eigenvector  associated  to  the  largest  eigen¬ 
value  of  the  matrix  P^Di,  it  is 

(1  -P)(1  -£)  +  jC/(1  -C)=p(l  -£)(1  -C)/C  +  (1  -  2' 

Since  the  l.h.s.  is  increasing  and  the  r.h.s.  is  decreasing  in  C,  and  the  former 
exceeds  the  latter  for  C  =  2/3  by  the  positive  amount  ( 1  -  j  e  )(1  -  ^p)  (3.3) 
is  proved.  Hence  the  triangle 

1 1  =  [(si,.s2,si) :  C/2  ^  Si  ^  1  -  C.i  =  1.2,3; 


with  vertices  sj ,  sj  and  .s*,  is  not  empty  and  has  the  property 
F,(li)C  I, ,i-  1,2,3. 


(3.4) 
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It  suffices  to  show  this  for  i  =  1.  In  view  of  the  preceding  Remark  we  have 
only  to  show  that  F|  (s^ )  is  in  h  (in  which  case  also  Fi  (s^ )  will  be  in  h  by 
symmetry).  First,  since  1  —  C  is  the  fixed  point  of  the  increasing  function  ([), 
it  is 

C./2  S;  Fi(si,S2,s^)(1)  1  -  c 

for  c/2  $  S)  ^  1  -  C.  Moreover  for  0  <  p  <  2/3  it  is  F|  (s^KIj  >  Fi  (sj  )(3). 
Finally 

F,(s:,)(3)  >  c/2  (3.5) 

holds  if  and  only  if  c  >  p  as  it  results  by  taking  into  account  that  I'l  (sj  )(3l  -- 
c/2  and  subtracting  each  side  from  the  corresponding  one  in  (3.5).  But  since 
c  >  p  in  the  range  of  values  of  p  and  c  we  are  considering  (3.4 )  is  proved. 

By  (3.4)  and  Corollary  2.3  if  is  necessarily  S  c  I,.  But  (3.5)  also 
ensures  that  we  will  never  get  the  equality  in  (3.4),  hence  S  will  never 
coincide  with  1 1  and  it  will  never  be  a  triangle,  either.  We  can  now  prove  the 
following  theorem: 

Theorem  3.2.  The  attractor  S  for  the  IFS  ( 1 .2)  and  ( 1 .4)  is  perfect  and  totalK' 
disconnectt'd  if  and  only  if 


otherwise  it  is  connected. 

Proof.  We  will  show  that  S  is  perfect  and  totally  di.sconnecled  if  and  onl\'  if 

1(2)  -  Fil.C,  1(1)  (3.71 

which,  once  written  down  explicitly,  is  shown  to  be  equivalent  to  c.  >  ; .  This 
happens  if  and  only  if  by  replacing  with  i  in  (3.3)  the  r  h  s.  exceeds  tlie 
l.h.s.  After  a  number  of  simplifications  (3. P)  is  obtained.  | 

Now  assume  (3.7).  For  anv  s  •  h  denote  bv  Q.,  and  the  sets 
Qs  ■-  !i)  •  li  :  Vi2  :  .sj.Vd  '  So  1 3.8) 

Ps  ly  e  h  :  Ui  .S2,yt  p  .si,.  (3.9) 

Then 

Qi.-..-  13.101 

By  .he  introductory  Remark  it  suffices  to  prove  (3.10)  with  B,  and  Q.  re¬ 
placed  by  the  segments  forming  their  boundary.  Next  the  first  is  \eritied 
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by  using  the  monotonicity  of  the  components  of  F|  on  each  segment  and 
checking  that  both 

-  0(,Sa -1- aHilic  c 
da 

and 

-^F,(si  +  a,S2,S2  -  a)li)la  o.  = 
da 

have  the  right  signs.  The  second  follows  by  symmetry.  In  particular  for 
s  =  S2  and  s  =  we  get 

Fi  (ll  I  C  P|  ,  :  ,  Fil  1 1  1  Qi 

Since  F|  (S2 1(21  =  l  1(  1 1,  from  I3.7|  it  follows  that  P| ,  (s-  ^  and  Q|  are 
disjoint,  from  which 

Fi((i)nF2(Ii)  =  H 

F3ence  for  i  i 

Fill]  I  ■  F,(  I)  I  ■  (), 

which  implies  by  Corollary  2.3  that  S  is  perfect  and  totally  disconnected. 
Suppose  now 

F-lsjld)  c  F2l,d)(2!  Fil.sjldl.  (.C11) 

we  will  constructa  sequence  xi,  -  f  j,  I.sj )  . ixT)  of  points  in  1 

converging  to  a  point  s  in  M  -  x  :  xd  )  -  x(2|I,  where  each  of  ii,  is  either  1 
or  2.  The  construction  builds  successively  the  digits  i^  in  the  code  of  .s.  Let  us 
start  from  the  code  I  of  the  periodic  poi'*  j  and  begin  to  substitute,  starting 
from  the  left,  the  svmbt'l  I  with  2  and  .v  aversely  until  it  will  happen  that 

d'lol)  •  i  but  tDlo2l)  ■  K 

where  a  is  a  list  of  1  and  2  (for  the  first  step  it  consists  only  of  2's  but  we  need 
it  for  the  sequent  steps). 

It's  clear  that  sooner  or  later  this  will  happen,  since  substituing  all  the 
symbols  of  the  address  of  s]  we  obtain  s}  c  R. 

Now  we  keep  the  first  digit  that  changed  will  lead  us  in  R,  and  keep  on 
trying  to  change  the  following  digit  in  the  code.  Now  the  procedure  ends  in 
a  finite  number  of  steps  at  a  point  in  M,  or  an  infinite  number  of  digits  will 
not  be  changed  (because  when  changed  the  corresponding  point  will  be  in 
R)  and  by  continuity  of  (j)  this  will  prove  the  convergence  to  a  point  in  M. 
The  above  statement  is  ctablished  provided  we  show 

(I>(u2l)  .  R  >  (I)(ol2i  -  R, 
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that  is  keeping  the  symbol  that  once  changed  would  lead  us  in  R  and  chang¬ 
ing  all  the  following  we  come  back  in  R. 

Let  Z]  =  F2(si )  and  Z2  =  Fi  (s^l  because  of  (3.11)  we  will  have  that: 

Z2  €  Qz,  or  equivalently  zi  e  Pz.-. 

Let  G(x)  =  F„(x).  Then  G(22)^€  Qciz, )  (equivalently  G(zi )  e  Pcizzi)- 
It  is  clear  that  since  G(zi )  =  <I)(a2T)  €  R  then  by  (3.10)  we  obtain  G(z2 )  = 
d>(cjl2)  €  Qg(zi  )  C  R; 

It  is  clear  that  this  argument,  with  the  appropriate  choice  of  G  as  a 
composition  of  Fi  and  F2  allows  to  construct  the  desired  sequence  xt  = 
<I)(aJ). 

Now  let  us  consider  the  repulsive  case  when 

^<P<1  (3.12) 

Since  the  triangle  having  the  fixed  points  of  the  Fj's  as  vertices  is  not  in¬ 
variant  anymore,  we  turn  our  attention  to  the  fixed  points  of  the  composition 
Fi  0  F|,  s‘i  for  i  ^  i. 

Let  $2,  =  (x* ,y*, z‘  1;  since 

f.dsi,)  =  s\2  =  {x\z\y'U 


then: 

z*  =  ildz'.y’ ) 

y‘=(l)(z')  (3.13) 

x*  -  vjj(z',x‘). 


First  of  all,  notice  that: 

(1  -  p)  $  y*  X*  ^  z*  5  (3.14) 

for  the  range  of  parameter  values  we  are  considering. 

To  establish  this  observe  first  that  z*  is  the  unique  solution  of  the 
equation: 

z  =  \l>(z,(t)(z)). 

Such  an  equation  can  be  written  as: 

,  ,  ,  Ip  2z)  !|2  -  3f )  z  4  c! 

4)(z)  - -  -  4, .  =  ’■( 


(3.1.3) 


The  geometry  of  tit  tractors  for  a  class  of  iterated  function  si/stems  } 


{  674 


where; 

r(0) 


and 


r 

3p  -  2 


>4)(0) 


P. 

2' 


r(l)  <0. 

At  the  unique  fixed  point  1  -  C  of  (j)  it  is; 

(i-€)(i  +  -  c)  =  (i-r)n  fid 

hence  by  substituting  into  the  r.h.s.  of  |3.t5)  we  get; 

r(l  -  C)  =  --C)  >  1  -  c. 

e/2 

from  which  it  follows  z’  >  1  -  Since  y’  -  (t)(z’  1  and  (|)  is  decreasing,  then 
y*  <  1  -  c. 

By  subtracting  the  first  from  then  third  of  (3.13)  we  get  that  z*  x'  and 
X*  -  y*  have  the  same  sign,  hence  (3.14)  is  established.  It  is  clear  now  that 
the  hexagon: 

b  =  '-s  :  y‘  .Si  i;  ,  (3.16) 

having  |  s’|  :  i  ^  i }  as  vertices  is  nonempty. 

Let  us  show  the  following: 

Theorem  3.3.  For  each  i  it  is 

Fi(b)  :  l2  (3.17) 

Proof.  By  symmetry  we  need  to  show  this  only  for  the  case  i  -  3. 

By  the  introductory  remark,  it  is  enough  to  look  at  the  images  of  each 
of  the  segments  of  the  boundary  of  b-  Since  i|’(s,t)  being  nonincreasing  in 
t  for  any  .s,  maps  L  inside  R:  hence  it  suffices  to  look  at  the  part  of  the 
boundary  of  b  which  is  inside  I . 

By  the  uniqueness  of  the  fixed  point  of  4),  I  -  C,  the  interval  [y‘,z’l  is 
invariant  under  cji.  | 

Let  us  now  consider  the  images  of  the  oriented  segments  [sj , ,  s  j ,  j.  First 

of  all: 

4)(x‘)<x*,  (3.18) 

this  is  proved  in  the  following  way.  The  solution  ofv(>(z,  x)  =  x  is 

p \[2^-  3t-)  ^  f  cl 
2(2  -  3e) z  4  3t'p 


X 
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which  is  increasing  in  z  and  is  ^  in  z  -  0.  Therefon  x 
which  (3.18)  follows. 

F-'roni  (3.18) 

F.,|s],)(3)  <  s:,,l3), 

moreover  it  is  easy  ti'  verify  that: 


hK(  I  } 

C  from 


1il.sj,ll2l'  Fils}, 112)-  .';’,_.|2|  -  z'. 

which  finally  shows  that  f  <  .sji U  :  Iz 

\ow  look  at  ,  s\| ).  It  is  enough  to  verifv  that: 

)|2|  .  tl'ly’.z"  )  •  ij‘  il)|z' ). 

It  is  immediately  checked  that  theopposite  inequality  holds  for  denominator, 
whereas  th.e  correct  inequality  holds  for  the  numerator  as  soon  as 

z-  p  2 
ijdzM  I  r 
which  is  always  satisfied. 

I'inallv  observe  that  .sj,,s'|,  is  mapped  bv  I  t  into  the  segment 
lsj_,. .s,p,  and  the  latter  is  mapped  into  a  segment  having  third  component 
costant  with  endpoints  internal  to  I.'. 
l  et  us  work  out  the  condition: 


i,i.s:,i(n  >i,is-,)iii  i.b’i 

which  will  he  shown  to  be  necessary  and  sufficient  for  disconnectedness 
I’or  the  moment  notice  that  the  above  is  equivalent  to: 

ll'lvi'.z'  I  (|)(l)‘i 

which  after  some  algebraic  manipulations  become: 

II  c)(l  '■  II  *  '  ^lY^  )"' 

where  z‘  is  the  unique  solution  of  i|)|zi  r(z). 
l.et  us  set  l(zj  ,  z. 

Now  r  is  a  quadratic  polynomial  with  r|0)  .>  Oand  r(  *;  |  0,  hence  the 

equation 


r(z|  Kz) 
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has  an  unique  positive  solution  in  z,  with; 

0<z<| 

Now  (j)  and  I  are  monotone  and  continuous,  chlO)  =  p/2  <  r(0)  hence 
(h  and  r  meet  necessarily  at  a  point  in  which  r  decreases  and  afterwards  4> 
dominates  r,  which  implies: 

cj)(z)  >  l(zl  z'  <  z  0(z')  ;>  l(z*). 

So  it  is  sufficent  to  find  when  the  first  inequality  is  true 
The  relations; 

1)  r(z)  -  Ui) 

2)  4)(z|  >  l(z| 

can  be  rewritten  as; 

2(1  -Clip -2z):(2-  3clz-*  c-  .-c’l3p  -21z  (3.22! 

(pe  (4  -  4c  -  4p  ^  3pc!z;  (1  -  d  .•  cz((2  3c)z  "  c’  (3,21 ! 


.Now,  multiplying  the  both  sides  of  (3.201  by  c  and  both  sides  of  (.T2i  I 
by  4(  1  el  and  then  summing  we  obtain: 


r(£.l  <  1(2.) 


with: 

,  2(1  clep 

"  Ap  B 

and 

A  ^  3c‘  8l  r  8 
B  =  6e^  -  12e  +  8 


(3.231 


After  some  algebraic  manipulations,  we  finally  get  that  3.14  is  equiva¬ 
lent  to: 


jl(3e^  -  6e_  f  4^^ 

"  (2  ■-eT(.k^^-87+8) 

This  consideration  led  us  to; 


(3.24) 


Theorem  3.4.  The  attractor  of  the  If-S  (3.1 1,  (3.21  with  2/3  p  1,  is  totall) 
disconnected  and  perfect  if  and  only  if  (3.24)  holds. 
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First  of  all  we  establish 

Fi  (Ps)  C  Qi ,  |si  Fi(Qs)  C  Pf,(si  (3.25) 

where  Ps  and  Q,  are  defined  as  in  (3.9)  and  (3.8)  with  )i  replaced  by  I.:. 
Because  of  our  introductory  Remark  it  is  suffices  to  check  that 

1)  £F,(s(l),s(2)-a,s(3)  +  a)(i)|„ 

2)  £F,(s(1)  +  a,,s(2),.s(3)-a)(i)|^ 

have  the  right  sign  for  i  =  1,2. 

Bv  symmetry  one  obtains: 

F2(Ps)  C  Q|  F2(Qs)  P|  ,(0-  (3,21:11 

Now  it  is  immediately  seen  that: 

Fidi)  C  Fi(Q,.,  )  C  P,,,:s:.,,  F,(l2l  C  F,(P.;,)  '  Q,,,.;  . 

and  since  (3.19)  is  ec]uivalent  to 

and  by  symmetry,  for  each  i  x  i: 

Fi  ( h  I Fj  ( Jj )  =  (/), 

hence  the  attractor  is  totally  disconnected  and  perfect. 

For  the  case 

F,(s;/)(2)  ;;  FdsijX'l 

we  will  construct  again  a  sequence  xc  -i--  (;,  ,,(,■;},)  of  points  in  1  converging 
to  a  points  s  in  M,  where  each  of  ii,  is  either  I  or  2.  The  construction  builds 
successively  the  digit  i|^  in  the  tVilr  of  .s.  Let  us  start  from  the  coi/c  2 1  of  the 
periodic  point  sji  and  begin  to  substitute,  starting  from  the  left,  the  symbol 
t  with  2  and  conversely  until  either 

1)  (PlCTlUl)  1  hut  0(01221)  •;  R,  or 

2)  0|ct21)  hut  0((t1T2)  -  K 

will  happen,  w'here  a  is  an  even  list  if  digits  I  and  2. 

It's  clear  that  sooner  or  later  this  will  happen,  since  substituing  all  the 
symbols  of  the  address  of  .sj,  we  obtain  ^  R. 

Now  we  keep  the  first  digit  that  changed  will  lead  us  in  R,  (1  in  case 
1 .,  2  in  case  2.)  and  keep  on  trying  to  change  the  following  digit  in  the  code. 
Now,  either  the  procedure  ends  in  a  finite  number  of  steps  at  a  point  in  M, 
or  an  infinite  number  of  digits  will  not  be  changed  (because  when  changed 
the  corresponding  point  kvill  be  in  R)  by  continuity  of  O- 
This  is  proved  once  it  established  that: 

1)  (D(al22T  R  =>  (D(alir2))  t  R 

2)  0(ct1T2  t  R  0(a221))  t  R 
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that  is,  keeping  the  symbol  that  changed  lead  us  in  R  and  changing  all  the 
following  we  come  back  in  R. 

Let  Z]  =  Fi(s^2)  and  22  =  F2(s2))  because  of  (3.27)  we  will  have  that: 
2)  €  Pj,  or  equivalently  22  €  Qz,- 

1)  Let  G(x)  =  Fct(Fi  (x)).  It  is  clear  that  since  G(z2 )  =  <I>(ct1221  t  R  then, 
since  the  number  of  the  maps  is  odd  and  by  (3.25)  and  (3.26),  we  obtain 
G(z,)  =  0(ffllT2)  e  Qcu2.  c  R; 

2)  Let  G(x)  =  Fa(x).  It  is  clear  that  G(2t)  =  (I)((jn2)  €  R  then,  since 
the  number  of  the  maps  is  even  and  by  (3.25)  and  (3.26),  we  obtain 
G(22)  =  (I)(ct221)  t  Qgu,  I  C  R; 

The  interested  reader  will  raise  the  question,  what  happens  for  2/3  < 
£  <  1.  In  this  case  we  have  not  been  able  to  find  a  simple  invariant  set  i 
as  in  the  previous  case.  However  by  a  numerical  experience  we  conjecture 
that  for  2/3  <  p  <  1  the  attractor  is  always  totally  disconnected  and  perfect, 
whereas  for  0  <  p  <  2/3  both  cases  should  appear,  but  the  codes  of  contact 
points  (12  and  21  in  the  persistent  and  1 1 2  and  221  in  the  repulsive  case)  seem 
to  vary  from  case  to  case,  suggesting  the  possibility  of  a  more  complicated 
behaviour. 
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Problems 


1 .  Wavelets  on  (Stephane  Jaffard) 

Recall  that  the  Hardy  space  is  the  subspace  of  I  ’  composed  of  func¬ 
tions  whose  Fourier  transforms  vanish  for  $  0.  The  problem  is  to  find 
an  orthonormal  basis  of  of  the  form  2'  k)  where  i|)  would  be 

smooth  and  well  localized  (the  maximal  localization  and  smoothness  possi¬ 
ble  would  also  have  to  be  determined).  Such  a  construction  would  be  useful 
in  mathematical  analysis  and  signal  analysis.  One  of  the  difficulties  lies  in 
the  following  result  (cf.  [2,  Chapter  3]): 

Theorem  1.1.  There  exists  no  multiresolution  analysis  on  which  would 
give  a  wavelet  vi)  such  that  U)  is  continuous  and 

I  U)  1C  C  I  £.  I'"''  for  any  <x  >  1/2. 

Thus  the  difficulty,  if  such  a  basis  exists,  lies  in  constructing  a  wavelet 
basis  which  could  not  be  obtained  through  a  multiresolution  analysi.s.  Such 
examples,  even  for  L^,  have  not  been  found  (to  our  knowledge). 

2.  Wavelets  on  domains  (Stdphane  Jaffard) 

A  key  point  for  the  resolution  of  PDF's  using  wavelets  is  to  construct  wavelets 
which  are  an  orthonormal  basis  of  t.'^(Q )  and  bases  of  Sobolev  spaces  on  the 
domain.  The  existing  results  are  the  following: 

On  a  segment  (or  a  square,  a  cube, . . .)  there  exists  an  orthonormal  basis 
of  compactly  supported  wavelets  of  the  form  2'  ■-’t|)(2'x  k),  except  for  N 

functions  at  each  level  j  localized  near  the  end  points  of  the  segment,  where 
the  function  4'  has  to  be  modified  (cf.  [5]).  There  exists  such  bases  which  are 
also  bases  for  the  H'  spaces  or  for  the  spaces;  but  for  no  other  spaces  (such 
as  Ti  H',  for  instance).  For  a  general  domain,  there  exists  a  similar  basis 
with  exponential  decay,  but  all  the  wavelets  have  to  be  modified  (though  the 
correction  exponentially  decreases  as  a  function  of  2' dist(k2“\ 0D)),  and 
they  are  bases  of  the  HJ.  A  non-smooth  basis  which  works  for  Hp  also  exists 
(cf.[1,4)). 

Bases  of  the  first  kind  which  would  work  for  more  general  dom.ains, 
and  bases,  even  of  the  second  kind,  but  which  would  work  for  a  given 
Sobolev  space,  or  could  Lc  adapted  to  mixed  boundary  coi  dilions,  would 
be  extremely  useful. 

6»7 

J.  S.  Byrnes  et  al.  (eds.).  Probabilistic  and  Stochastic  Methods  in  Analysis,  with  Applications.  687-690. 
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4.  (Herve  Queffelec) 

Let  Ni  be  an  integer  Y  1  Does  there  exist  a  sequence  Ci,  . . Cn  with  icji  -  1 
and  sup,,  ,  c'lZ'' '  c-  Bv'N  where  B  is  a  numerical  constant?  Or  even 

sup  :  Y"  Cjz’'  j  -  of  v'N  log  N) 

i;i  1  , 

(If  one  takes  random  signs  for  the  Cj,  one  gets  0(  log  N ).)  Or  (still  les*; 
demanding)  can  one  find  a  sequence  of  positive  integers,  with  A„  ,  i  ■  A„ , 

)  A„  -» 'X',  and  tor  each  N  a  sequence  (c'i  )|..  i<,N  of  complex  numbers 
of  modulus  one  such  that 

N 

sup  i  ^  C,z''  !  0(  V  Yilog  N ) 

’  1 

Remark  4.1.  The  modified  Kudin-Shapiro  sequence  Po  -  Qo  -  1  and 
P„M(i)  ^  P,.(i)  f  i'"QM(il 

Qm  I  1  (z|  Pniz)  -  z’  Qn(z) 

provides  examples  of  sequences  of  integers  i  with  density  zero  for 

which  there  exists  (Cj)  with 

N 

supl  V  CiZ''’i  =  0(v/N| 

ui  r  I  ' 
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5.  One-Sided  sampling  (Harold  Shapiro) 


When  a  banHlirnited  function  t(t)  on  'if  is  sampled  at  ec]ually-spaced  values 
oft,  faster  than  the  Nyquist  rate,  than  f  (assumed,  say,  to  belong  to  L'(tif ))  can 
be  reconstructed  from  these  samples,  as  is  well-known,  by  a  procedure  that  is 
stable  with  respect  to  small  fluctuations  in  the  data.  Actually,  the  redundancy 
of  the  sampled  information  is  so  great  that  even  the  one-sided  samples,  i.e., 
the  sample  values  f(t)  for  t  >  1  (where  T  is  any  prescribed  number)  suffice 
in  pirinciple  for  the  reconstruction  of  f;  but  now  the  reconstruction  process 
leads  to  an  ill-posed  problem  in  the  sense  of  Hadamard,  i.e.,  tiny  changes  in 
the  sampled  values  can  lead  to  huge  changes  in  the  reconstructed  f,  and  this 
instability  gets  worse  the  larger  1  is,  and  the  closer  the  sampling  rate  to  the 
Nyquist  rate.  The  situation  seems  reminiscent  of  reconstructing  a  hologram 
from  its  fragment 

Before  posing  our  ciuestion,  let  us  recapitulate  the  t\er\  u ell-known) 
ideas  that  substantiate  what  has  just  been  said.  As  i>ur  basic  setup,  let  iis 
consider  the  case  I  0,  and  a.ssumi> 


fit  i 


1  “ 
2rT  . 


where  a,  with  0  • 


Jiltele  '  die 
(1  •  T  IS  given  and  4)  - 


I  -! 


11,  ol.  and  that  the  miinbers 


iHuMc  '""'die 


are  given  for  integral  n  ■  0,  and  that  we  u-ish  l<>  calculate  •  (or  wli.ii  is 
equivalent,  and  more  convenient  for  us.  <!)). 

Let  us  define 


iL  ( te  I 


J  <fi(u’l  lie  •  Cl 

I  0  Cl  ■.  cel  s  Ti 


Then,  the  innijinvicss  ot  the  reconstruction  is  easv  lor,  if  i ,,  d  for 
n  ,>  0,  the  Lourier  coefficients  of  if),  which  is  in  I  ’(11 1  (II  being  the  interval 
(  71, 7ii,  identified  as  the  unit  cire'e)  vanish  forpxisitive  n,  so  if  is  the  complex- 
conjugate  of  a  function  in  the  I  lardv  spae'e  1 1-’ (Ill  and,  vanishing  lor  u  - 
ice  ■-  71,  it  vanishes  identically. 

To  perform  the  reconstruction,  let  cj’  •  ((‘(ni  be  the  function  with 

Lourier  expansion 


4'  X^c.c"'"'. 

11  I 

Our  problem  is  to  find  if)  i  Ix’iri  1  such  that 

1)  (f)  vanishes  a.e.  on  the  arc  L  :  ci  -  (ii'i  •  Ttof  the  unit  circle; 

2)  if’  41  lias  vanishing  Fourier  coefficients  of  positive  index  (and  hence, 
is  of  the  form  h  where  h  *  H^lfl)). 
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Thus,  the  problem  reduces  to  finding  h  €  H^(n)  which,  on  the  given 
proper  sub-arc  f  of  the  unit  circle,  is  equal  a.e.  to  the  known  function  ij>  £ 
L^(l  ).  This  is  a  case  of  ihe  classical  problem  of  Carleman,  to  reconstruct  an 
analytic  function  in  a  plane  domain,  in  terms  of  the  values  on  a  proper  part 
of  its  boundary.  Many  papers  have  been  devoted  to  this  question.  It  is  a 
well-known  “badly  posed  problem"  and  can  be  attacked  numerically,  e.g., 
by  use  of  Tikhonov's  method  of  regularization. 

We  now  come  to  our  questions.  First  of  all,  can  anything  similar  be 
done  if  the  one-sided  samples  are  not  equally  spaced  (but,  sufficiently  dense 
to  imply  uniqueness  of  f;  conditions  for  this  are  known)?  Also,  how  is  the 
above-sketched  solution  (even  in  the  equally-spaced  case)  to  be  modified  if 
f  is  not  a  deterministic  function  but  a  stationary  Gaussian  process,  and  noise 
is  present? 
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